In this episode I introduce Policy Gradient methods for Deep Reinforcement Learning.
After a general overview, I dive into Proximal Policy Optimization: an algorithm designed at OpenAI that tries to find a balance between sample efficiency and code complexity. PPO is the algorithm used to train the OpenAI Five system and is also used in a wide range of other challenges like Atari and robotic control tasks.
If you want to support this channel, here is my patreon link:
https://patreon.com/ArxivInsights — You are amazing!! 😉
Links mentioned in the video:
⦁ PPO paper: https://arxiv.org/abs/1707.06347
⦁ TRPO paper: https://arxiv.org/abs/1502.05477
⦁ Aurelien Geron: KL divergence and entropy in ML: https://youtu.be/ErfnhcEV1O8
⦁ Deep RL Bootcamp – Lecture 5: https://youtu.be/xvRrgxcpaHY
⦁ OpenAI Baselines TensorFlow implementation: https://github.com/openai/baselines