Skip to content

NTU-RL2025-02/RecoveryDAgger

Repository files navigation

Contributors Stargazers Issues MIT license

RecoveryDAgger

Query-Efficient Online Imitation Learning Through Recovery Policy

Paper · Slides

Table of Contents
  1. About The Project
  2. Getting Started
  3. Usage
  4. License
  5. Acknowledgments

About RecoveryDAgger

Imitation learning enables agents to acquire complex behaviors from expert demonstrations, yet standard behavior cloning (BC) often suffers from covariate shift and compounding errors in sequential decision-making tasks. Interactive methods such as DAgger alleviate this issue by querying the expert on states visited by the learner, but they typically require frequent expert supervision, resulting in high annotation cost.

In this work, we propose RecoveryDAgger, a query-efficient interactive imitation learning framework that augments DAgger-style training with a learned recovery mechanism. Instead of immediately querying the expert in risky states, RecoveryDAgger first invokes a recovery policy that locally corrects the agent’s behavior by ascending the gradient of a learned Success Q-function, which estimates the probability of task completion. Expert queries are reserved for truly novel states where recovery is unreliable, thereby reducing redundant supervision. Experiments on the PointMaze navigation task demonstrate that RecoveryDAgger significantly reduces the number of expert queries while achieving comparable success rates to strong query-efficient baselines. Our work establishes the effectiveness of integrating learned recovery policies into interactive imitation learning to enhance query efficiency.

(back to top)

Getting Started

Prerequisites

Before installing the project, make sure you have the following dependencies installed:

  • Python ≥ 3.10

  • One of the following environment managers:

    • Conda (Anaconda or Miniconda) (recommended)
    • Python venv

Installation

Follow the steps below to set up the environment and install the project.

Option A: Using Conda (Recommended)

1. Clone the repository
git clone https://github.com/NTU-RL2025-02/RecoveryDAgger.git
cd RecoveryDAgger

Alternatively, you may download the source code from the Releases section.

2. Create and activate a Conda environment
conda create -n recoverydagger python=3.10
conda activate recoverydagger
3. Install PyTorch

Install PyTorch according to your platform. Please refer to the official PyTorch website for the latest instructions:

👉 https://pytorch.org

Examples:

# CPU / Apple Silicon
pip install torch torchvision
# CUDA 12.8
# pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128
4. Install project dependencies

From the repository root directory:

pip install -e .

Option B: Using Python venv (Without Conda)

Details If Conda is not available, you can use Python’s built-in `venv` module instead.
1. Clone the repository
git clone https://github.com/NTU-RL2025-02/RecoveryDAgger.git
cd RecoveryDAgger
2. Create and activate a virtual environment
python3 -m venv venv

Activate the environment:

  • macOS / Linux

    source venv/bin/activate
  • Windows

    venv\Scripts\activate
3. Upgrade pip (recommended)
pip install --upgrade pip
4. Install PyTorch

Install PyTorch according to your platform:

👉 https://pytorch.org

(Use the same commands as in Option A.)

5. Install project dependencies
pip install -e .

(back to top)

Usage

Note: All commands should be executed from the project root directory
(i.e., the directory that contains this README.md file).

Tip: All Python scripts support the --help flag.
You can run python <script>.py --help to see all available options and their descriptions.

Data Collection

  1. A pre-generated offline demonstration dataset is provided at:

    models/demonstrations/offline_data_100.pkl
    
  2. If you wish to regenerate the offline dataset from scratch, please refer to the data collection scripts.

python models/demonstrations/gen_offline_data_maze.py --rule-base-expert --episodes 50 --output models/demonstrations/offline_data_50.pkl

Training

1. Train RecoveryDAgger from scratch

This command trains RecoveryDAgger starting from behavior cloning (BC) pretraining.

python3 train.py \
    --seed 48763 \
    --device 0 \
    --iters 30 \
    --demonstration_set_file "models/demonstrations/offline_data_100.pkl" \
    --environment "PointMaze_4rooms-v3" \
    --recovery_type "q" \
    --num_test_episodes 100 \
    --noisy_scale 1.0 \
    --save_bc_checkpoint "models/bc_models/4room_rule_base_100.pt" \
    --fix_thresholds \
    sample_experiment

The final positional argument specifies the experiment name, which determines the output directory under data/.

2. Continue training from a pretrained BC model

If you already have a pretrained behavior cloning model, you can skip BC pretraining and continue training directly:

python3 train.py \
    --seed 48763 \
    --device 0 \
    --iters 30 \
    --demonstration_set_file "models/demonstrations/offline_data_100.pkl" \
    --environment "PointMaze_4rooms-v3" \
    --recovery_type "five_q" \
    --num_test_episodes 100 \
    --fix_thresholds \
    --noisy_scale 1.0 \
    --skip_bc_pretrain \
    --bc_checkpoint "models/bc_models/4rooms_rule_base_100_noise_0.pt" \
    sample_load_bc_experiment

Notes:

  • --recovery_type: Specifies the type of recovery policy to use. Supported options:

    • "q" (Success Q)
    • "five_q" (Ensemble Success Q)
    • "expert" (ThriftyDAgger baseline)
  • --demonstration_set_file: Path to the offline demonstration dataset.

  • --skip_bc_pretrain: Skip behavior cloning (BC) pretraining and start training directly from a pretrained BC model.

  • --bc_checkpoint: Path to the pretrained BC model checkpoint. This flag is required when --skip_bc_pretrain is set.

  • After training, the trained model will located at data/[exp_name]/[exp_name]_s[seed]/best_model.pt

Evaluation

To evaluate a trained model:

python eval.py data/sample/best_model_five_q.pt \
    --environment "PointMaze_4rooms-v3" \
    --iters 100 \
    --noisy_scale 1.0

To plot evaluation trajectories:

python eval.py data/sample/best_model_five_q.pt \
    --environment "PointMaze_4rooms-v3" \
    --iters 100 \
    --noisy_scale 1.0 \
    --trajectory

The evaluation script reports success rate over multiple runs. Optional visualization flags (e.g., rendering or Q-value heatmaps) can be enabled if supported.

(back to top)

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

(back to top)

License

Distributed under the MIT license. See LICENSE.txt for more information.

(back to top)

Acknowledgments

This project is forked from the thriftydagger repository.

While the codebase has been significantly modified and extended, we gratefully acknowledge the original authors for providing the initial implementation and research foundation.

(back to top)

About

Query-Efficient Online Imitation Learning Through Recovery Policy

Resources

License

Stars

Watchers

Forks

Contributors

Languages