Published on October 1, 2018 by

In this episode I introduce Policy Gradient methods for Deep Reinforcement Learning.

After a general overview, I dive into Proximal Policy Optimization: an algorithm designed at OpenAI that tries to find a balance between sample efficiency and code complexity. PPO is the algorithm used to train the OpenAI Five system and is also used in a wide range of other challenges like Atari and robotic control tasks.

If you want to support this channel, here is my patreon link: — You are amazing!! 😉

Links mentioned in the video:
⦁ PPO paper:
⦁ TRPO paper:
⦁ Aurelien Geron: KL divergence and entropy in ML:
⦁ Deep RL Bootcamp – Lecture 5:
⦁ OpenAI Baselines TensorFlow implementation:

Category Tag