My attempt to solve problem 4.9 from Sutton and Barto's textbook, which demands solving the gambler's problem using the value iteration method. All of the configs are based on the assumptions of the exercise.
Solving this problem with Policy Iteration method.