| 2013 | ||
|---|---|---|
| j9 | Patrick M. Pilarski, Michael Rory Dawson, Thomas Degris, Jason P. Carey, K. Ming Chan, Jacqueline S. Hebert, Richard S. Sutton: Adaptive Artificial Limbs: A Real-Time Approach to Prediction and Anticipation. IEEE Robot. Automat. Mag. 20(1): 53-64 (2013) | |
| i5 | Harm van Seijen, Richard S. Sutton: Planning by Prioritized Sweeping with Small Backups. CoRR abs/1301.2343 (2013) | |
| 2012 | ||
| j8 | David Silver, Richard S. Sutton, Martin Müller: Temporal-difference search in computer Go. Machine Learning 87(2): 183-219 (2012) | |
| c54 | Ashique Rupam Mahmood, Richard S. Sutton, Thomas Degris, Patrick M. Pilarski: Tuning-free step-size adaptation. ICASSP 2012: 2121-2124 | |
| c53 | Adam White, Joseph Modayil, Richard S. Sutton: Scaling life-long off-policy learning. ICDL-EPIROB 2012: 1-6 | |
| c52 | ||
| c51 | ||
| c50 | Joseph Modayil, Adam White, Richard S. Sutton: Multi-timescale Nexting in a Reinforcement Learning Robot. SAB 2012: 299-309 | |
| c49 | Joseph Modayil, Adam White, Patrick M. Pilarski, Richard S. Sutton: Acquiring a broad range of empirical knowledge in real time by temporal-difference learning. SMC 2012: 1903-1910 | |
| i4 | ||
| i3 | Richard S. Sutton, Csaba Szepesvári, Alborz Geramifard, Michael Bowling: Dyna-Style Planning with Linear Function Approximation and Prioritized Sweeping. CoRR abs/1206.3285 (2012) | |
| i2 | Adam White, Joseph Modayil, Richard S. Sutton: Scaling Life-long Off-policy Learning. CoRR abs/1206.6262 (2012) | |
| 2011 | ||
| c48 | Richard S. Sutton, Joseph Modayil, Michael Delp, Thomas Degris, Patrick M. Pilarski, Adam White, Doina Precup: Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction. AAMAS 2011: 761-768 | |
| i1 | Joseph Modayil, Adam White, Richard S. Sutton: Multi-timescale Nexting in a Reinforcement Learning Robot. CoRR abs/1112.1133 (2011) | |
| 2010 | ||
| c47 | Hamid Reza Maei, Csaba Szepesvári, Shalabh Bhatnagar, Richard S. Sutton: Toward Off-Policy Learning Control with Function Approximation. ICML 2010: 719-726 | |
| 2009 | ||
| j7 | Shalabh Bhatnagar, Richard S. Sutton, Mohammad Ghavamzadeh, Mark Lee: Natural actor-critic algorithms. Automatica 45(11): 2471-2482 (2009) | |
| c46 | Richard S. Sutton, Hamid Reza Maei, Doina Precup, Shalabh Bhatnagar, David Silver, Csaba Szepesvári, Eric Wiewiora: Fast gradient-descent methods for temporal-difference learning with linear function approximation. ICML 2009: 125 | |
| c45 | Hamid Reza Maei, Csaba Szepesvári, Shalabh Bhatnagar, Doina Precup, David Silver, Richard S. Sutton: Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation. NIPS 2009: 1204-1212 | |
| c44 | Hengshuai Yao, Richard S. Sutton, Shalabh Bhatnagar, Diao Dongcui, Csaba Szepesvári: Multi-Step Dyna Planning for Policy Evaluation and Control. NIPS 2009: 2187-2195 | |
| 2008 | ||
| j6 | Elliot A. Ludvig, Richard S. Sutton, E. James Kehoe: Stimulus Representation and the Timing of Reward-Prediction Errors in Models of the Dopamine System. Neural Computation 20(12): 3034-3054 (2008) | |
| c43 | Maria Cutumisu, Duane Szafron, Michael H. Bowling, Richard S. Sutton: Agent Learning using Action-Dependent Learning Rates in Computer Role-Playing Games. AIIDE 2008 | |
| c42 | David Silver, Richard S. Sutton, Martin Müller: Sample-based learning and search with permanent and transient memories. ICML 2008: 968-975 | |
| c41 | Elliot A. Ludvig, Richard S. Sutton, Eric Verbeek, E. James Kehoe: A computational model of hippocampal function in trace conditioning. NIPS 2008: 993-1000 | |
| c40 | Richard S. Sutton, Csaba Szepesvári, Hamid Reza Maei: A Convergent O(n) Temporal-difference Algorithm for Off-policy Learning with Linear Function Approximation. NIPS 2008: 1609-1616 | |
| c39 | Richard S. Sutton, Csaba Szepesvári, Alborz Geramifard, Michael H. Bowling: Dyna-Style Planning with Linear Function Approximation and Prioritized Sweeping. UAI 2008: 528-536 | |
| 2007 | ||
| c38 | Richard S. Sutton, Anna Koop, David Silver: On the role of tracking in stationary environments. ICML 2007: 871-878 | |
| c37 | David Silver, Richard S. Sutton, Martin Müller: Reinforcement Learning of Local Shape in the Game of Go. IJCAI 2007: 1053-1058 | |
| c36 | Shalabh Bhatnagar, Richard S. Sutton, Mohammad Ghavamzadeh, Mark Lee: Incremental Natural Actor-Critic Algorithms. NIPS 2007 | |
| 2006 | ||
| c35 | Alborz Geramifard, Michael H. Bowling, Richard S. Sutton: Incremental Least-Squares Temporal Difference Learning. AAAI 2006: 356-361 | |
| c34 | Alborz Geramifard, Michael H. Bowling, Martin Zinkevich, Richard S. Sutton: iLSTD: Eligibility Traces and Convergence Analysis. NIPS 2006: 441-448 | |
| 2005 | ||
| c33 | Brian Tanner, Richard S. Sutton: TD(lambda) networks: temporal-difference networks with eligibility traces. ICML 2005: 888-895 | |
| c32 | Eddie J. Rafols, Mark B. Ring, Richard S. Sutton, Brian Tanner: Using Predictive Representations to Improve Generalization in Reinforcement Learning. IJCAI 2005: 835-840 | |
| c31 | ||
| c30 | Doina Precup, Richard S. Sutton, Cosmin Paduraru, Anna Koop, Satinder P. Singh: Off-policy Learning with Options and Recognizers. NIPS 2005 | |
| c29 | Richard S. Sutton, Eddie J. Rafols, Anna Koop: Temporal Abstraction in Temporal-difference Networks. NIPS 2005 | |
| 2004 | ||
| c28 | ||
| 2002 | ||
| e1 | Rina Dechter, Richard S. Sutton (Eds.): Proceedings of the Eighteenth National Conference on Artificial Intelligence and Fourteenth Conference on Innovative Applications of Artificial Intelligence, July 28 - August 1, 2002, Edmonton, Alberta, Canada. AAAI Press / The MIT Press 2002 | |
| 2001 | ||
| c27 | Doina Precup, Richard S. Sutton, Sanjoy Dasgupta: Off-Policy Temporal Difference Learning with Function Approximation. ICML 2001: 417-424 | |
| c26 | Peter Stone, Richard S. Sutton: Scaling Reinforcement Learning toward RoboCup Soccer. ICML 2001: 537-544 | |
| c25 | Michael L. Littman, Richard S. Sutton, Satinder P. Singh: Predictive Representations of State. NIPS 2001: 1555-1561 | |
| c24 | ||
| 2000 | ||
| c23 | Doina Precup, Richard S. Sutton, Satinder P. Singh: Eligibility Traces for Off-Policy Policy Evaluation. ICML 2000: 759-766 | |
| c22 | Peter Stone, Richard S. Sutton, Satinder P. Singh: Reinforcement Learning for 3 vs. 2 Keepaway. RoboCup 2000: 249-258 | |
| 1999 | ||
| j5 | Richard S. Sutton, Doina Precup, Satinder P. Singh: Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning. Artif. Intell. 112(1-2): 181-211 (1999) | |
| c21 | ||
| c20 | Richard S. Sutton, David A. McAllester, Satinder P. Singh, Yishay Mansour: Policy Gradient Methods for Reinforcement Learning with Function Approximation. NIPS 1999: 1057-1063 | |
| 1998 | ||
| j4 | Richard S. Sutton, Andrew G. Barto: Reinforcement Learning: An Introduction. IEEE Transactions on Neural Networks 9(5): 1054-1054 (1998) | |
| c19 | Doina Precup, Richard S. Sutton, Satinder P. Singh: Theoretical Results on Reinforcement Learning with Temporally Abstract Options. ECML 1998: 382-393 | |
| c18 | Richard S. Sutton, Doina Precup, Satinder P. Singh: Intra-Option Learning about Temporally Abstract Actions. ICML 1998: 556-564 | |
| c17 | Robert Moll, Andrew G. Barto, Theodore J. Perkins, Richard S. Sutton: Learning Instance-Independent Value Functions to Enhance Local Search. NIPS 1998: 1017-1023 | |
| c16 | Richard S. Sutton, Satinder P. Singh, Doina Precup, Balaraman Ravindran: Improved Switching among Temporally Abstract Actions. NIPS 1998: 1066-1072 | |
| c15 | ||
| 1997 | ||
| c14 | ||
| c13 | Doina Precup, Richard S. Sutton: Exponentiated Gradient Methods for Reinforcement Learning. ICML 1997: 272-277 | |
| c12 | ||
| 1996 | ||
| j3 | Satinder P. Singh, Richard S. Sutton: Reinforcement Learning with Replacing Eligibility Traces. Machine Learning 22(1-3): 123-158 (1996) | |
| 1995 | ||
| c11 | ||
| c10 | Richard S. Sutton: Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding. NIPS 1995: 1038-1044 | |
| 1993 | ||
| c9 | Richard S. Sutton, Steven D. Whitehead: Online Learning with Random Representations. ICML 1993: 314-321 | |
| 1992 | ||
| c8 | Richard S. Sutton: Adapting Bias by Gradient Descent: An Incremental Version of Delta-Bar-Delta. AAAI 1992: 171-176 | |
| 1991 | ||
| j2 | Richard S. Sutton: Dyna, an Integrated Architecture for Learning, Planning, and Reacting. SIGART Bulletin 2(4): 160-163 (1991) | |
| c7 | Richard S. Sutton, Christopher J. Matheus: Learning Polynomial Functions by Feature Construction. ML 1991: 208-212 | |
| c6 | ||
| c5 | Terence D. Sanger, Richard S. Sutton, Christopher J. Matheus: Iterative Construction of Sparse Polynomial Approximations. NIPS 1991: 1064-1071 | |
| 1990 | ||
| c4 | Richard S. Sutton: Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming. ML 1990: 216-224 | |
| c3 | Richard S. Sutton: Integrated Modeling and Control Based on Reinforcement Learning. NIPS 1990: 471-478 | |
| 1989 | ||
| c2 | Andrew G. Barto, Richard S. Sutton, Christopher J. C. H. Watkins: Sequential Decision Probelms and Neural Networks. NIPS 1989: 686-693 | |
| 1988 | ||
| j1 | Richard S. Sutton: Learning to Predict by the Methods of Temporal Differences. Machine Learning 3: 9-44 (1988) | |
| 1985 | ||
| c1 | Oliver G. Selfridge, Richard S. Sutton, Andrew G. Barto: Training and Tracking in Robotics. IJCAI 1985: 670-672 | |
Colors in the list of coauthors
Last update Mon May 20 05:29:20 2013 CET by the DBLP Team —
Data released under the ODC-BY 1.0 license — See also our legal information page