| 2009 | ||
|---|---|---|
| 50 | Richard S. Sutton, Hamid Reza Maei, Doina Precup, Shalabh Bhatnagar, David Silver, Csaba Szepesvári, Eric Wiewiora: Fast gradient-descent methods for temporal-difference learning with linear function approximation. ICML 2009: 125 | |
| 2008 | ||
| 49 | Maria Cutumisu, Duane Szafron, Michael H. Bowling, Richard S. Sutton: Agent Learning using Action-Dependent Learning Rates in Computer Role-Playing Games. AIIDE 2008 | |
| 48 | David Silver, Richard S. Sutton, Martin Müller: Sample-based learning and search with permanent and transient memories. ICML 2008: 968-975 | |
| 47 | Richard S. Sutton, Csaba Szepesvári, Hamid Reza Maei: A Convergent O(n) Temporal-difference Algorithm for Off-policy Learning with Linear Function Approximation. NIPS 2008: 1609-1616 | |
| 46 | Elliot A. Ludvig, Richard S. Sutton, Eric Verbeek, E. James Kehoe: A computational model of hippocampal function in trace conditioning. NIPS 2008: 993-1000 | |
| 45 | Richard S. Sutton, Csaba Szepesvári, Alborz Geramifard, Michael H. Bowling: Dyna-Style Planning with Linear Function Approximation and Prioritized Sweeping. UAI 2008: 528-536 | |
| 44 | Elliot A. Ludvig, Richard S. Sutton, E. James Kehoe: Stimulus Representation and the Timing of Reward-Prediction Errors in Models of the Dopamine System. Neural Computation 20(12): 3034-3054 (2008) | |
| 2007 | ||
| 43 | Richard S. Sutton, Anna Koop, David Silver: On the role of tracking in stationary environments. ICML 2007: 871-878 | |
| 42 | David Silver, Richard S. Sutton, Martin Müller: Reinforcement Learning of Local Shape in the Game of Go. IJCAI 2007: 1053-1058 | |
| 41 | Shalabh Bhatnagar, Richard S. Sutton, Mohammad Ghavamzadeh, Mark Lee: Incremental Natural Actor-Critic Algorithms. NIPS 2007 | |
| 2006 | ||
| 40 | Alborz Geramifard, Michael H. Bowling, Richard S. Sutton: Incremental Least-Squares Temporal Difference Learning. AAAI 2006 | |
| 39 | Alborz Geramifard, Michael H. Bowling, Martin Zinkevich, Richard S. Sutton: iLSTD: Eligibility Traces and Convergence Analysis. NIPS 2006: 441-448 | |
| 2005 | ||
| 38 | Brian Tanner, Richard S. Sutton: TD(lambda) networks: temporal-difference networks with eligibility traces. ICML 2005: 888-895 | |
| 37 | Eddie J. Rafols, Mark B. Ring, Richard S. Sutton, Brian Tanner: Using Predictive Representations to Improve Generalization in Reinforcement Learning. IJCAI 2005: 835-840 | |
| 36 | Brian Tanner, Richard S. Sutton: Temporal-Difference Networks with History. IJCAI 2005: 865-870 | |
| 35 | Doina Precup, Richard S. Sutton, Cosmin Paduraru, Anna Koop, Satinder P. Singh: Off-policy Learning with Options and Recognizers. NIPS 2005 | |
| 34 | Richard S. Sutton, Eddie J. Rafols, Anna Koop: Temporal Abstraction in Temporal-difference Networks. NIPS 2005 | |
| 2004 | ||
| 33 | Richard S. Sutton, Brian Tanner: Temporal-Difference Networks. NIPS 2004 | |
| 2001 | ||
| 32 | Doina Precup, Richard S. Sutton, Sanjoy Dasgupta: Off-Policy Temporal Difference Learning with Function Approximation. ICML 2001: 417-424 | |
| 31 | Peter Stone, Richard S. Sutton: Scaling Reinforcement Learning toward RoboCup Soccer. ICML 2001: 537-544 | |
| 30 | Michael L. Littman, Richard S. Sutton, Satinder P. Singh: Predictive Representations of State. NIPS 2001: 1555-1561 | |
| 29 | Peter Stone, Richard S. Sutton: Keepaway Soccer: A Machine Learning Testbed. RoboCup 2001: 214-223 | |
| 2000 | ||
| 28 | Doina Precup, Richard S. Sutton, Satinder P. Singh: Eligibility Traces for Off-Policy Policy Evaluation. ICML 2000: 759-766 | |
| 27 | Peter Stone, Richard S. Sutton, Satinder P. Singh: Reinforcement Learning for 3 vs. 2 Keepaway RoboCup 2000: 249-258 | |
| 1999 | ||
| 26 | Richard S. Sutton: Open Theoretical Questions in Reinforcement Learning. EuroCOLT 1999: 11-17 | |
| 25 | Richard S. Sutton, David A. McAllester, Satinder P. Singh, Yishay Mansour: Policy Gradient Methods for Reinforcement Learning with Function Approximation. NIPS 1999: 1057-1063 | |
| 24 | Richard S. Sutton, Doina Precup, Satinder P. Singh: Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning. Artif. Intell. 112(1-2): 181-211 (1999) | |
| 1998 | ||
| 23 | Doina Precup, Richard S. Sutton, Satinder P. Singh: Theoretical Results on Reinforcement Learning with Temporally Abstract Options. ECML 1998: 382-393 | |
| 22 | Richard S. Sutton, Doina Precup, Satinder P. Singh: Intra-Option Learning about Temporally Abstract Actions. ICML 1998: 556-564 | |
| 21 | Robert Moll, Andrew G. Barto, Theodore J. Perkins, Richard S. Sutton: Learning Instance-Independent Value Functions to Enhance Local Search. NIPS 1998: 1017-1023 | |
| 20 | Richard S. Sutton, Satinder P. Singh, Doina Precup, Balaraman Ravindran: Improved Switching among Temporally Abstract Actions. NIPS 1998: 1066-1072 | |
| 19 | Richard S. Sutton: Reinforcement Learning: Past, Present and Future. SEAL 1998: 195-197 | |
| 18 | Richard S. Sutton, Andrew G. Barto: Reinforcement Learning: An Introduction. IEEE Transactions on Neural Networks 9(5): 1054-1054 (1998) | |
| 1997 | ||
| 17 | Richard S. Sutton: On the Significance of Markov Decision Processes. ICANN 1997: 273-282 | |
| 16 | Doina Precup, Richard S. Sutton: Exponentiated Gradient Methods for Reinforcement Learning. ICML 1997: 272-277 | |
| 15 | Doina Precup, Richard S. Sutton: Multi-time Models for Temporally Abstract Planning. NIPS 1997 | |
| 1996 | ||
| 14 | Satinder P. Singh, Richard S. Sutton: Reinforcement Learning with Replacing Eligibility Traces. Machine Learning 22(1-3): 123-158 (1996) | |
| 1995 | ||
| 13 | Richard S. Sutton: TD Models: Modeling the World at a Mixture of Time Scales. ICML 1995: 531-539 | |
| 12 | Richard S. Sutton: Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding. NIPS 1995: 1038-1044 | |
| 1993 | ||
| 11 | Richard S. Sutton, Steven D. Whitehead: Online Learning with Random Representations. ICML 1993: 314-321 | |
| 1992 | ||
| 10 | Richard S. Sutton: Adapting Bias by Gradient Descent: An Incremental Version of Delta-Bar-Delta. AAAI 1992: 171-176 | |
| 1991 | ||
| 9 | Richard S. Sutton, Christopher J. Matheus: Learning Polynomial Functions by Feature Construction. ML 1991: 208-212 | |
| 8 | Richard S. Sutton: Planning by Incremental Dynamic Programming. ML 1991: 353-357 | |
| 7 | Terence D. Sanger, Richard S. Sutton, Christopher J. Matheus: Iterative Construction of Sparse Polynomial Approximations. NIPS 1991: 1064-1071 | |
| 6 | Richard S. Sutton: Dyna, an Integrated Architecture for Learning, Planning, and Reacting. SIGART Bulletin 2(4): 160-163 (1991) | |
| 1990 | ||
| 5 | Richard S. Sutton: Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming. ML 1990: 216-224 | |
| 4 | Richard S. Sutton: Integrated Modeling and Control Based on Reinforcement Learning. NIPS 1990: 471-478 | |
| 1989 | ||
| 3 | Andrew G. Barto, Richard S. Sutton, Christopher J. C. H. Watkins: Sequential Decision Probelms and Neural Networks. NIPS 1989: 686-693 | |
| 1988 | ||
| 2 | Richard S. Sutton: Learning to Predict by the Methods of Temporal Differences. Machine Learning 3: 9-44 (1988) | |
| 1985 | ||
| 1 | Oliver G. Selfridge, Richard S. Sutton, Andrew G. Barto: Training and Tracking in Robotics. IJCAI 1985: 670-672 | |