Follow
Philip Thomas
Title
Cited by
Cited by
Year
Data-efficient off-policy policy evaluation for reinforcement learning
P Thomas, E Brunskill
International Conference on Machine Learning, 2139-2148, 2016
7312016
Value function approximation in reinforcement learning using the Fourier basis
G Konidaris, S Osentoski, P Thomas
Proceedings of the AAAI conference on artificial intelligence 25 (1), 380-385, 2011
5602011
High-confidence off-policy evaluation
P Thomas, G Theocharous, M Ghavamzadeh
Proceedings of the AAAI Conference on Artificial Intelligence 29 (1), 2015
3002015
High confidence policy improvement
P Thomas, G Theocharous, M Ghavamzadeh
International Conference on Machine Learning, 2380-2388, 2015
2122015
Preventing undesirable behavior of intelligent machines
P Thomas, B Castro da Silva, A Barto, S Giguere, Y Brun, E Brunskill
Science 366 (6468), 999-1004, 2019
1852019
Ad recommendation systems for life-time value optimization
G Theocharous, PS Thomas, M Ghavamzadeh
Proceedings of the 24th international conference on world wide web, 1305-1310, 2015
1852015
Learning action representations for reinforcement learning
Y Chandak, G Theocharous, J Kostas, S Jordan, P Thomas
International conference on machine learning, 941-950, 2019
1782019
Increasing the action gap: New operators for reinforcement learning
MG Bellemare, G Ostrovski, A Guez, P Thomas, R Munos
Proceedings of the AAAI Conference on Artificial Intelligence 30 (1), 2016
1672016
Bias in natural actor-critic algorithms
P Thomas
International conference on machine learning, 441-448, 2014
1562014
Safe reinforcement learning
PS Thomas
1152015
Is the policy gradient a gradient?
C Nota, PS Thomas
arXiv preprint arXiv:1906.07073, 2019
662019
Training an actor-critic reinforcement learning controller for arm movement using human-generated rewards
KM Jagodnik, PS Thomas, AJ van den Bogert, MS Branicky, RF Kirsch
IEEE Transactions on Neural Systems and Rehabilitation Engineering 25 (10 …, 2017
652017
Proximal reinforcement learning: A new theory of sequential decision making in primal-dual spaces
S Mahadevan, B Liu, P Thomas, W Dabney, S Giguere, N Jacek, I Gemp, ...
arXiv preprint arXiv:1405.6757, 2014
652014
Predictive off-policy policy evaluation for nonstationary decision problems, with applications to digital marketing
P Thomas, G Theocharous, M Ghavamzadeh, I Durugkar, E Brunskill
Proceedings of the AAAI Conference on Artificial Intelligence 31 (2), 4740-4745, 2017
632017
Optimizing for the future in non-stationary mdps
Y Chandak, G Theocharous, S Shankar, M White, S Mahadevan, ...
International Conference on Machine Learning, 1414-1425, 2020
612020
Policy gradient methods for reinforcement learning with function approximation and action-dependent baselines
PS Thomas, E Brunskill
arXiv preprint arXiv:1706.06643, 2017
602017
Evaluating the performance of reinforcement learning algorithms
S Jordan, Y Chandak, D Cohen, M Zhang, P Thomas
International Conference on Machine Learning, 4962-4973, 2020
582020
Risk Quantification for Policy Deployment
PS Thomas, G Theocharous, M Ghavamzadeh
US Patent App. 14/552,047, 2016
532016
Importance Sampling for Fair Policy Selection.
S Doroudi, PS Thomas, E Brunskill
Grantee Submission, 2017
502017
Offline contextual bandits with high probability fairness guarantees
B Metevier, S Giguere, S Brockman, A Kobren, Y Brun, E Brunskill, ...
Advances in neural information processing systems 32, 2019
492019
The system can't perform the operation now. Try again later.
Articles 1–20