Natural actor–critic algorithms S Bhatnagar, RS Sutton, M Ghavamzadeh, M Lee Automatica 45 (11), 2471-2482, 2009 | 374 | 2009 |
Bayesian reinforcement learning: A survey M Ghavamzadeh, S Mannor, J Pineau, A Tamar Foundations and Trends® in Machine Learning 8 (5-6), 359-483, 2015 | 143 | 2015 |
Regularized policy iteration AM Farahmand, M Ghavamzadeh, S Mannor, C Szepesvári Advances in Neural Information Processing Systems, 441-448, 2009 | 143 | 2009 |
Best arm identification: A unified approach to fixed budget and fixed confidence V Gabillon, M Ghavamzadeh, A Lazaric Advances in Neural Information Processing Systems, 3212-3220, 2012 | 142 | 2012 |
Incremental natural actor-critic algorithms S Bhatnagar, M Ghavamzadeh, M Lee, RS Sutton Advances in neural information processing systems, 105-112, 2008 | 137 | 2008 |
Hierarchical multi-agent reinforcement learning R Makar, S Mahadevan, M Ghavamzadeh Proceedings of the fifth international conference on Autonomous agents, 246-253, 2001 | 134 | 2001 |
Hierarchical multi-agent reinforcement learning M Ghavamzadeh, S Mahadevan, R Makar Autonomous Agents and Multi-Agent Systems 13 (2), 197-229, 2006 | 128 | 2006 |
J. 4 supervised actor-critic reinforcement learning M Barto, MT Rosenstein Handbook of learning and approximate dynamic programming 2, 359, 2004 | 126 | 2004 |
High-confidence off-policy evaluation PS Thomas, G Theocharous, M Ghavamzadeh Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015 | 95 | 2015 |
Bayesian policy gradient algorithms M Ghavamzadeh, Y Engel Advances in neural information processing systems, 457-464, 2007 | 80 | 2007 |
Finite-Sample Analysis of Proximal Gradient TD Algorithms. B Liu, J Liu, M Ghavamzadeh, S Mahadevan, M Petrik UAI, 504-513, 2015 | 77 | 2015 |
Bayesian multi-task reinforcement learning A Lazaric, M Ghavamzadeh | 75 | 2010 |
Learning to communicate and act using hierarchical reinforcement learning M Ghavamzadeh, S Mahadevan Proceedings of the Third International Joint Conference on Autonomous Agents …, 2004 | 74 | 2004 |
Multi-bandit best arm identification V Gabillon, M Ghavamzadeh, A Lazaric, S Bubeck Advances in Neural Information Processing Systems, 2222-2230, 2011 | 73 | 2011 |
Analysis of a classification-based policy iteration algorithm A Lazaric, M Ghavamzadeh, R Munos | 73 | 2010 |
Regularized fitted Q-iteration for planning in continuous-space Markovian decision problems A massoud Farahmand, M Ghavamzadeh, C Szepesvári, S Mannor 2009 American Control Conference, 725-730, 2009 | 73 | 2009 |
High confidence policy improvement P Thomas, G Theocharous, M Ghavamzadeh International Conference on Machine Learning, 2380-2388, 2015 | 71 | 2015 |
Finite-sample analysis of least-squares policy iteration A Lazaric, M Ghavamzadeh, R Munos Journal of Machine Learning Research 13 (Oct), 3041-3074, 2012 | 67 | 2012 |
Speedy Q-learning MG Azar, R Munos, M Ghavamzadaeh, HJ Kappen Spain, Granada: NIPS, 2011 | 66 | 2011 |
Finite-sample analysis of LSTD A Lazaric, M Ghavamzadeh, R Munos | 63 | 2010 |