Hybrid deep reinforcement learning: combine the best of gradient-based and gradient-free methods (NYU Shanghai DURF 2018)
This repository features my research project on deep reinforcement learning in my sophomore year at NYU Shanghai (advised by Prof. Keith Ross, supported by NYU Shanghai Dean's Undergraduate Research Fund). In this project, I experimented with combining Policy Gradient methods, including vanilla Policy Gradient (aka REINFORCE), Actor-Critic, and Proximal Policy Optimization (PPO) with Evolution Strategies to develop a hybrid algorithm with improved sample efficiency. Performances of the proposed algorithms were evaluated on MuJoCo benchmarks.
References:
- REINFORCE: Ronald J Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8(3-4):229–256, 1992.
- Actor-Critic: Richard S Sutton, David A McAllester, Satinder P Singh, and Yishay Mansour. Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems, pages 1057–1063, 2000.
- PPO: John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
- Evolution Strategy: Tim Salimans, Jonathan Ho, Xi Chen, Szymon Sidor, and Ilya Sutskever. Evolution strategies as a scalable alternative to reinforcement learning. arXiv preprint arXiv:1703.03864, 2017.
- MuJoCo: Emanuel Todorov, Tom Erez, Yuval Tassa. MuJoCo: A physics engine for model-based control. https://ieeexplore.ieee.org/document/6386109