Skip to content

HokageM/IRLwPython

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

IRLwPython

Inverse Reinforcement Learning Algorithm implementation with python.

Implemented Algorithms

Maximum Entropy IRL:

Implementation of the Maximum Entropy inverse reinforcement learning algorithm from [1] and is based on the implementation of lets-do-irl. It is an IRL algorithm using Q-Learning with a Maximum Entropy update function.

Maximum Entropy Deep IRL:

An implementation of the Maximum Entropy inverse reinforcement learning algorithm, which uses a neural-network for the actor. The estimated irl-reward is learned similar as in Maximum Entropy IRL. It is an IRL algorithm using Deep Q-Learning with a Maximum Entropy update function.

Maximum Entropy Deep RL:

An implementation of the Maximum Entropy reinforcement learning algorithm. This algorithm is used to compare the IRL algorithms with an RL algorithm.

Experiment

Mountaincar-v0

The Mountaincar-v0 is used for evaluating the different algorithms. Therefore, the implementation of the MDP for the Mountaincar from gym is used.

The expert demonstrations for the Mountaincar-v0 are the same as used in lets-do-irl.

Heatmap of Expert demonstrations with 400 states:

Comparing the algorithms

The following tables compare the result of training and testing the two IRL algorithms Maximum Entropy and Maximum Entropy Deep. Furthermore, results for the RL algorithm Maximum Entropy Deep algorithm are shown, to highlight the differences between IRL and RL.

Algorithm Training Curve after 1000 Episodes Training Curve after 5000 Episodes
Maximum Entropy IRL
Maximum Entropy Deep IRL
Maximum Entropy Deep RL
Algorithm State Frequencies Learner: 1000 Episodes State Frequencies Learner: 2000 Episodes State Frequencies Learner: 5000 Episodes
Maximum Entropy IRL
Maximum Entropy Deep IRL
Maximum Entropy Deep RL
Algorithm IRL Rewards: 1000 Episodes IRL Rewards: 2000 Episodes IRL Rewards: 5000 Episodes IRL Rewards: 14000 Episodes
Maximum Entropy IRL None
Maximum Entropy Deep IRL None
Maximum Entropy Deep RL None None None None
Algorithm Testing Results: 100 Runs
Maximum Entropy IRL
Maximum Entropy Deep IRL
Maximum Entropy Deep RL

References

The implementation of MaxEntropyIRL and MountainCar is based on the implementation of: lets-do-irl

[1] BD. Ziebart, et al., "Maximum Entropy Inverse Reinforcement Learning", AAAI 2008.

Installation

cd IRLwPython
pip install .

Usage

usage: irl-runner [-h] [--version] [--training] [--testing] [--render] ALGORITHM

Implementation of IRL algorithms

positional arguments:
  ALGORITHM   Currently supported training algorithm: [max-entropy, max-entropy-deep, max-entropy-deep-rl]

options:
  -h, --help  show this help message and exit
  --version   show program's version number and exit
  --training  Enables training of model.
  --testing   Enables testing of previously created model.
  --render    Enables visualization of mountaincar.

Packages

 
 
 

Contributors

Languages