Reinforcement Learning Basics

This is for basic reinforcement learning: algorithms and main equations for dynamic programming (DP), monte carlo method (MC), temporal difference (TD) and deep reinforcement learning (DRL). The task of the robot is to collect data of all sensors in the shortest possible time while it avoids any collisions to the obstacles.

Details

0. Grid Environments

5x5 grid env. (grid_env_55.ipynb)
- Fast convergence, recommended!!!
10X10 grid env. (grid_env.ipynb)
- Slow convergence

1. Dynamic Programming (DP)

There are two versions of DP: state value based and action value based
Policy evaluation, policy improvent
Policy iteration
Value iteration

2. Monte Carlo Method (MC)

On-policy first visit MC
Off-policy first visit MC

3. Temporal Difference (TD)

SARSA, Q-learning, Expected SARSA, Double Q-learning

4. Deep Reinforcement Learning (DRL)

We have a robot that aims to collect data of several low-powered IoT sensors. As the sensors are low-powered, they cannot communcate over long ranges. Hence, the robot must approach each sensor to collect their data. The robot starts its mission from the start terminal. There is a charging station in the environment so that the robot can recharge its battery if it is running out of energy. Also, there are several obstacles in the environment.

A sample result:

Value Functions (State value $V(s)$, Action value $Q(s,a)$ )

10x10 Grid Environment

In the following image, we have depicted the environment:

red square: starting position

green square: charging station

Black circles: IoT sensors

Blue blocks: obstacles

In this project, we define the state as a four channel image, shown below

Based on this definition, we can use CNNs to solve the MDP.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
results		results
DRL_PathPlanning.ipynb		DRL_PathPlanning.ipynb
README.md		README.md
dynamic_programming_action_value.ipynb		dynamic_programming_action_value.ipynb
dynamic_programming_state_value.ipynb		dynamic_programming_state_value.ipynb
grid_env.ipynb		grid_env.ipynb
grid_env_55.ipynb		grid_env_55.ipynb
monte_carlo_method.ipynb		monte_carlo_method.ipynb
temporal_difference.ipynb		temporal_difference.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reinforcement Learning Basics

Details

0. Grid Environments

1. Dynamic Programming (DP)

2. Monte Carlo Method (MC)

3. Temporal Difference (TD)

4. Deep Reinforcement Learning (DRL)

Value Functions (State value $V(s)$, Action value $Q(s,a)$ )

10x10 Grid Environment

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Reinforcement Learning Basics

Details

0. Grid Environments

1. Dynamic Programming (DP)

2. Monte Carlo Method (MC)

3. Temporal Difference (TD)

4. Deep Reinforcement Learning (DRL)

Value Functions (State value $V(s)$, Action value $Q(s,a)$ )

10x10 Grid Environment

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages