Skip to content

Latest commit

 

History

History
24 lines (17 loc) · 1.97 KB

README.md

File metadata and controls

24 lines (17 loc) · 1.97 KB

Materials

More materials

  • Actually proving the policy gradient for discounted rewards - article

  • On variance of policy gradient and optimal baselines: article, another article

  • Learn Advatangeg Actor Critic with a comic

  • Generalizing log-derivative trick - url

  • Combining policy gradient and q-learning - arxiv

  • Variational perspective on reinforcement learning (from DeepBayes) - pdf

  • Adversarial review of policy gradient - blog

Run seminar notebook in Colab: Open In Colab

Run optional homework notebook in Colab: Open In Colab