Table of Contents
Imitation learning enables agents to acquire complex behaviors from expert demonstrations, yet standard behavior cloning (BC) often suffers from covariate shift and compounding errors in sequential decision-making tasks. Interactive methods such as DAgger alleviate this issue by querying the expert on states visited by the learner, but they typically require frequent expert supervision, resulting in high annotation cost.
In this work, we propose RecoveryDAgger, a query-efficient interactive imitation learning framework that augments DAgger-style training with a learned recovery mechanism. Instead of immediately querying the expert in risky states, RecoveryDAgger first invokes a recovery policy that locally corrects the agent’s behavior by ascending the gradient of a learned Success Q-function, which estimates the probability of task completion. Expert queries are reserved for truly novel states where recovery is unreliable, thereby reducing redundant supervision. Experiments on the PointMaze navigation task demonstrate that RecoveryDAgger significantly reduces the number of expert queries while achieving comparable success rates to strong query-efficient baselines. Our work establishes the effectiveness of integrating learned recovery policies into interactive imitation learning to enhance query efficiency.
Before installing the project, make sure you have the following dependencies installed:
-
Python ≥ 3.10
-
One of the following environment managers:
- Conda (Anaconda or Miniconda) (recommended)
- Python venv
Follow the steps below to set up the environment and install the project.
git clone https://github.com/NTU-RL2025-02/RecoveryDAgger.git
cd RecoveryDAggerAlternatively, you may download the source code from the Releases section.
conda create -n recoverydagger python=3.10
conda activate recoverydaggerInstall PyTorch according to your platform. Please refer to the official PyTorch website for the latest instructions:
Examples:
# CPU / Apple Silicon
pip install torch torchvision# CUDA 12.8
# pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128From the repository root directory:
pip install -e .Details
If Conda is not available, you can use Python’s built-in `venv` module instead.git clone https://github.com/NTU-RL2025-02/RecoveryDAgger.git
cd RecoveryDAggerpython3 -m venv venvActivate the environment:
-
macOS / Linux
source venv/bin/activate -
Windows
venv\Scripts\activate
pip install --upgrade pipInstall PyTorch according to your platform:
(Use the same commands as in Option A.)
pip install -e .Note: All commands should be executed from the project root directory
(i.e., the directory that contains thisREADME.mdfile).
Tip: All Python scripts support the
--helpflag.
You can runpython <script>.py --helpto see all available options and their descriptions.
-
A pre-generated offline demonstration dataset is provided at:
models/demonstrations/offline_data_100.pkl -
If you wish to regenerate the offline dataset from scratch, please refer to the data collection scripts.
python models/demonstrations/gen_offline_data_maze.py --rule-base-expert --episodes 50 --output models/demonstrations/offline_data_50.pklThis command trains RecoveryDAgger starting from behavior cloning (BC) pretraining.
python3 train.py \
--seed 48763 \
--device 0 \
--iters 30 \
--demonstration_set_file "models/demonstrations/offline_data_100.pkl" \
--environment "PointMaze_4rooms-v3" \
--recovery_type "q" \
--num_test_episodes 100 \
--noisy_scale 1.0 \
--save_bc_checkpoint "models/bc_models/4room_rule_base_100.pt" \
--fix_thresholds \
sample_experimentThe final positional argument specifies the experiment name, which determines the output directory under data/.
If you already have a pretrained behavior cloning model, you can skip BC pretraining and continue training directly:
python3 train.py \
--seed 48763 \
--device 0 \
--iters 30 \
--demonstration_set_file "models/demonstrations/offline_data_100.pkl" \
--environment "PointMaze_4rooms-v3" \
--recovery_type "five_q" \
--num_test_episodes 100 \
--fix_thresholds \
--noisy_scale 1.0 \
--skip_bc_pretrain \
--bc_checkpoint "models/bc_models/4rooms_rule_base_100_noise_0.pt" \
sample_load_bc_experimentNotes:
-
--recovery_type: Specifies the type of recovery policy to use. Supported options:"q"(Success Q)"five_q"(Ensemble Success Q)"expert"(ThriftyDAgger baseline)
-
--demonstration_set_file: Path to the offline demonstration dataset. -
--skip_bc_pretrain: Skip behavior cloning (BC) pretraining and start training directly from a pretrained BC model. -
--bc_checkpoint: Path to the pretrained BC model checkpoint. This flag is required when--skip_bc_pretrainis set. -
After training, the trained model will located at
data/[exp_name]/[exp_name]_s[seed]/best_model.pt
To evaluate a trained model:
python eval.py data/sample/best_model_five_q.pt \
--environment "PointMaze_4rooms-v3" \
--iters 100 \
--noisy_scale 1.0To plot evaluation trajectories:
python eval.py data/sample/best_model_five_q.pt \
--environment "PointMaze_4rooms-v3" \
--iters 100 \
--noisy_scale 1.0 \
--trajectoryThe evaluation script reports success rate over multiple runs. Optional visualization flags (e.g., rendering or Q-value heatmaps) can be enabled if supported.
Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature) - Commit your Changes (
git commit -m 'Add some AmazingFeature') - Push to the Branch (
git push origin feature/AmazingFeature) - Open a Pull Request
Distributed under the MIT license. See LICENSE.txt for more information.
This project is forked from the thriftydagger repository.
While the codebase has been significantly modified and extended, we gratefully acknowledge the original authors for providing the initial implementation and research foundation.