Hierarchical Programmatic Reinforcement Learning via Learning to Compose Programs

This repository demonstrate the implementation code of Hierarchical Programmatic Reinforcement Learning via Learning to Compose Programs, which is published in ICML 2023. Please visit our project page for more information.

We re-formulate solving a reinforcement learning task as synthesizing a task-solving program that can be executed to interact with the environment and maximize the return. We first learn a program embedding space that continuously parameterizes a diverse set of programs sampled from a program dataset. Then, we train a meta-policy, whose action space is the learned program embedding space, to produce a series of programs (i.e., predict a series of actions) to yield a composed task-solving program.

The experimental results in the Karel domain show that our proposed framework outperforms baseline approaches. The ablation studies confirm the limitations of LEAPS and justify our design choices.

Environments

Karel Environments

The implementation code can be found in this directory

Getting Started

Python 3.6
PyTorch 1.4.0
Install virtualenv, create a virtual environment, activate it and install the requirements in requirements.txt.

pip3 install --upgrade virtualenv
virtualenv hprl
source hprl/bin/activate
pip3 install -r requirements.txt

Usage

HPRL Training

Stage 1: Learning Program Embeddings

Download dataset from here
Unzip the file

bash run_vae_option_L30.sh

Stage 2: Meta-Policy Training

bash run_meta_policy_new_vae_ppo_64dim.sh

Baseline Scripts of LLM-GS

# The scripts are in scripts/{task}.sh
bash scripts/cleanHouse.sh

Note that the task implementation of LLM-GS, HC, and HPRL are different. This is because the implementation in HC and HPRL have some bugs. In this repository, we follow the implementation of LLM-GS.

We only record the training programs and evaluate the programs using te evaluation from HC.

Cite the paper

@inproceedings{liu2023hierarchical, 
  title={Hierarchical Programmatic Reinforcement Learning via Learning to Compose Programs}, 
  author={Guan-Ting Liu and En-Pei Hu and Pu-Jen Cheng and Hung-Yi Lee and Shao-Hua Sun}, 
  booktitle = {International Conference on Machine Learning}, 
  year={2023} 
}

Authors

Guan-Ting Liu, En-Pei Hu, Pu-Jen Cheng, Hung-Yi Lee, Shao-Hua Sun

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
docs		docs
karel_env		karel_env
pretrain		pretrain
prl_gym		prl_gym
rl		rl
scripts		scripts
tasks		tasks
.gitignore		.gitignore
README.md		README.md
eval.sh		eval.sh
fetch_mapping.py		fetch_mapping.py
mapping_karel2prl_new_vae_v2.txt		mapping_karel2prl_new_vae_v2.txt
requirements.txt		requirements.txt
run_meta_policy_new_vae_ppo_64dim.sh		run_meta_policy_new_vae_ppo_64dim.sh
run_vae_option_L30.sh		run_vae_option_L30.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hierarchical Programmatic Reinforcement Learning via Learning to Compose Programs

Environments

Karel Environments

Getting Started

Usage

HPRL Training

Stage 1: Learning Program Embeddings

Stage 2: Meta-Policy Training

Baseline Scripts of LLM-GS

Cite the paper

Authors

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Hierarchical Programmatic Reinforcement Learning via Learning to Compose Programs

Environments

Karel Environments

Getting Started

Usage

HPRL Training

Stage 1: Learning Program Embeddings

Stage 2: Meta-Policy Training

Baseline Scripts of LLM-GS

Cite the paper

Authors

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages