MAZE USING REINFORCEMENT LEARNING
In this project, we use the Bellman equation, which utilizes the state value function based on the concept of Dynamic Programming.
[V(s) = \max(R(s, a) + \gamma V(s'))]
- (V(s)): State value function of the current state
- (V(s')): State value function of the next state
- (R(s, a)): Reward obtained upon performing action (a) from state (s)
- (\gamma): Discount factor (It is a hyperparameter that determines the amount of importance we give to future rewards)
WHITE: Agent | GREEN: Final Destination | BLUE: Wall | RED: Danger
We visualize the matrix using the matplotlib library. The agent must move in the direction of more heat color in order to reach the destination.
The purple blocks trace the pathway to the destination.
- Numpy
- Pygame
- Matplotlib
