Bibliography

Top | Notations | Bibliography

General Sources

Foundations of Deep Reinforcement Learning, L. Graesser and W. L. Keng (2019). Theory and examples, with implementations using OpenAI Gym, pytorch, tensorflow, and SLM Lab
- To run code: docker run -it --name ubuntu_16_04 ubuntu:16.04 then follow install instructions here.
Reinforcement Learning: An Introduction, Sutton and Barto (2nd edition, 2018). Clear presentation, builds up from simple example. Authors are major contributors in the field. David Silver (AlphaZero architect) says he read their 1st edition as a first step to learn about RL.
Algorithms of Reinforcement Learning, C. Szepesvári (2015), recommended by David Silver as more mathematical and faster paced than the Sutton and Barto book
Fundamentals of Machine Learning for Predictive Data Analytics, J.D. Kelleher et al (2020). Nice survey of ML. Chap 11 on RL: Markov Decision Processes (MDP), Bellman Equations, Temporal-Difference Learning, Q-Learning, SARSA, Deep Q-Networks (DQN)
MIT 6.5191: Deep Reinforcement Learning, Alexander Amini (2020). High level, very clear presentation. Deep Q-Learning (DQN), Policy Gradient (PG), AlphaGo & AlphaZero
MIT 6.S091: Introduction to Deep Reinforcement Learning, Lex Fridman (2019). Explains well how a small change in reward function gives completely different policy.
RL Course by David Silver, youtube (2015)
David Silver: AlphaGo, AlphaZero, and Deep Reinforcement Learning, Lex Fridman Podcast #86 (2020)
A (Long) Peek into Reinforcement Learning, L. Weng (2018)
OpenAI Baselines, a set of high-quality implementations of reinforcement learning algorithms
Offline Reinforcement Learning: Tutorial, Review,and Perspectives on Open Problems, Sergey Levine et al (2020). Explains how RL is modified for offline learning.

Most books available at https://b-ok.cc.

Additional Sources for Policy Gradient Algorithms

Policy Gradient Methods for Reinforcement Learning with Function Approximation, R. Sutton et. al. (1999)
Policy Gradient Algorithms, L. Weng (2018)
Why are policy gradient methods preferred over value function approximation in continuous action domains?
Deriving Policy Gradients and Implementing REINFORCE, C. Yoon (2018)
REINFORCE Algorithm: Taking baby steps in reinforcement learning (2020), with code examples

Additional Sources for Math Appendix

Introduction to Probability, C.M. Grinstead, J.L. Snell, Chap. 11, Markov Chains