Skip to content

gpho/policy-gradient-demo

Repository files navigation

A demo of REINFORCE: the simplest Policy Gradient reinforcement learning algorithm

REINFORCE is the most basic of policy gradient algorithms, a class of reinforcement learning algorithms which represent the policy explicitly and optimize its parameters using gradient descent. As described in our presentation slides, the algorithm underlying REINFORCE is simple to understand and has a direct analogy to gradient descent in supervised learning.

As a project in Harvard's Advanced Machine Learning, Data Mining, and Artificial Intelligence (CSCI E-82) course in Fall 2018, I developed this notebook to demonstrate a simple implementation of the REINFORCE algorithm using TensorFlow, Keras, and OpenAI gym.

Prerequisites

This code uses the following packages:

pip install gym

TO-DO

  • Add videos to demo
  • Update Prerequisites

Acknowledgments

Thanks to Mahmood M. Shad who collaborated with me on the presentation slides for this project.

This work lends code and/or inspiration from multiple sources:

Learning material

Code bases

About

Demo of REINFORCE algorithm using TensorFlow and OpenAI Gym

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors