A demo of REINFORCE: the simplest Policy Gradient reinforcement learning algorithm

REINFORCE is the most basic of policy gradient algorithms, a class of reinforcement learning algorithms which represent the policy explicitly and optimize its parameters using gradient descent. As described in our presentation slides, the algorithm underlying REINFORCE is simple to understand and has a direct analogy to gradient descent in supervised learning.

As a project in Harvard's Advanced Machine Learning, Data Mining, and Artificial Intelligence (CSCI E-82) course in Fall 2018, I developed this notebook to demonstrate a simple implementation of the REINFORCE algorithm using TensorFlow, Keras, and OpenAI gym.

Prerequisites

This code uses the following packages:

pip install gym

TO-DO

Add videos to demo
Update Prerequisites

Acknowledgments

Thanks to Mahmood M. Shad who collaborated with me on the presentation slides for this project.

This work lends code and/or inspiration from multiple sources:

Learning material

Great blog post intro by Andrej Karpathy: http://karpathy.github.io/2016/05/31/rl/
Great course materials:
- Reinforcement Learning at UCL by David Silver: http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching.html
- Deep RL at Berkeley by Sergey Levine: http://rail.eecs.berkeley.edu/deeprlcourse/
Best intro book: Reinforcement Learning by Sutton & Barto (free pdf): http://incompleteideas.net/book/the-book.html
Great YouTube explanations by Arxiv Insights: https://www.youtube.com/channel/UCNIkB2IeJ-6AmZv7bQ1oBYg

Code bases

OpenAI Spinning Up: https://spinningup.openai.com/en/latest/
See also Baselines https://github.com/openai/baselines)
garage/rllab (Berkeley): https://github.com/rlworkgroup/garage
Deep RL course with ipynb examples: https://github.com/simoninithomas/Deep_reinforcement_learning_Course
Minimal and clean code examples: https://github.com/rlcode/reinforcement-learning

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
lunar		lunar
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
REINFORCE and Policy Gradient - Slides.pdf		REINFORCE and Policy Gradient - Slides.pdf
REINFORCE demo.ipynb		REINFORCE demo.ipynb
notebook.tex		notebook.tex
output_12_1.png		output_12_1.png
output_16_1.png		output_16_1.png
output_18_1.png		output_18_1.png
output_18_2.png		output_18_2.png
output_23_1.png		output_23_1.png
output_23_2.png		output_23_2.png
output_26_2.png		output_26_2.png
output_28_2.png		output_28_2.png
output_33_1.png		output_33_1.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A demo of REINFORCE: the simplest Policy Gradient reinforcement learning algorithm

Prerequisites

TO-DO

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

A demo of REINFORCE: the simplest Policy Gradient reinforcement learning algorithm

Prerequisites

TO-DO

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages