Reinforcement Learning
Can you create an unbeatable opponent?
Can you create an unbeatable opponent?
In reinforcement learning, for every state there exists a ‘value’ which measures how good it is to be in said state. In the case of this game, there was an initial value of ‘0’ for each state except for the state where there was one token remaining. In this case, the value of the state was -10, as it indicates that the player has lost the game. In the video below we see that as the agent plays each game, (an epoch) the values quickly converge to approximate their true values. This is behavior that can be defined as the agent ‘learning’ how to play the game.
Recording of the completed code. While building a reinforcement learning agent from scratch was pretty hard, it really helped me understand the math going on behind the scenes, especially the Bellman equation. Notice in the first game it wins handily. But in the second, a poorly timed exploration move causes it to lose. I should have decreased the learning rate over time, but I had to give us poor humans a chance somehow right?