What is the best way to route data in a network of routers spread out across the globe? This ‘internet of things’-based problem can be solved using reinforcement learning! In this video, i’ll explain the 2 types of policies, the bellman equation, and the value function. All of these concepts are crucial in the RL pipeline and using animations + code, i’ll break them down. Enjoy!

