Existing vision-and-language navigation models often deviate from the correct trajectory when executing instructions. However, these models lack effective error correction capability, hindering their recovery from errors.
To address this challenge, we propose the Self-correction Flywheel, a novel post-training paradigm. Instead of considering the model’s error trajectories on the training set as a drawback, our paradigm emphasizes their significance as a valuable data source. We have developed a method to identify deviations in these error trajectories and devised innovative techniques to automatically generate self-correction data for perception and action. These self-correction data serve as fuel to power the model’s continued training.
The brilliance of our paradigm is revealed when we re-evaluate the model on the training set, uncovering new error trajectories. At this time, the self-correction flywheel begins to spin. Through multiple flywheel iterations, we progressively enhance our monocular RGB-based VLA navigation model, CorrectNav.
- Release CorrectNav model weights.
- Release evaluation scripts for the R2R-CE benchmark.
- Release real-world fine-tuning code (Coming Soon!).
We recommend setting up the environment on an RTX 3090 workstation with Ubuntu 22.04 and CUDA 12.1.
conda create -n CorrectNav python=3.10 cmake=3.14.0 -y
conda activate CorrectNav
You will need to install specific versions of Habitat:
git clone --branch stable https://github.com/facebookresearch/habitat-lab.git
cd habitat-lab
pip install -e habitat-lab # install habitat_lab
- habitat-sim 0.3.3 Please follow the official Build from Source instructions to build habitat-sim in headless mode with CUDA support.
From the root directory of this repository, run:
pip install --upgrade pip
pip install -e ".[train]"
pip install flash-attn --no-build-isolation
Note: If you only need inference/serving, you can use
pip install -e ".[standalone]"instead, and install extra runtime dependencies as needed.
Prepare the VLN datasets (R2R / RxR) by following the instructions in the VLN-CE Data Section to set up the MP3D scene dataset and VLN-CE episodes dataset.
Create a new directory named habitat-data-0.2.5 and organize your downloaded datasets exactly as shown below:
habitat-data-0.2.5/
├── datasets/
│ └── vlnnav/
│ ├── r2r/
│ │ ├── test/
│ │ ├── train/
│ │ │ ├── decompose.py
│ │ │ ├── filter.json
│ │ │ └── ...
│ │ ├── val_seen/
│ │ └── val_unseen/
│ └── rxr/
│ ├── test_challenge/
│ ├── train/
│ ├── val_seen/
│ └── val_unseen/
└── scenes/
└── mp3d/
├── 17DRP5sb8fy/
│ ├── 17DRP5sb8fy.glb
│ ├── 17DRP5sb8fy.house
│ ├── 17DRP5sb8fy.navmesh
│ └── ...
├── 1LXtFkjw3qL/
├── 1pXnuDYAj8r/
└── ...
We provide comprehensive scripts to evaluate CorrectNav:
- Runner:
eval_vln_r2r_6.py - Launcher:
eval.sh
📥 Download CorrectNav Model Weights Here
Before starting the evaluation, please update the evaluation scripts with your local paths and settings:
pretrained = "YOUR_MODEL_PATH"ckpt_chosen = ...(Used for naming logs and JSON outputs)CUDA_VISIBLE_DEVICES = "0..7"(Adjust based on your GPU availability)
Start the evaluation by executing the launcher script:
bash eval.sh
If you find our work, code, or model weights helpful in your research, please consider citing our paper:
@misc{correctnav,
title={CorrectNav: Self-Correction Flywheel Empowers Vision-Language-Action Navigation Model},
author={Zhuoyuan Yu and Yuxing Long and Zihan Yang and Chengyan Zeng and Hongwei Fan and Jiyao Zhang and Hao Dong},
year={2025},
eprint={2508.10416},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2508.10416},
}