Skip to content

NoahFrahm/Prune-Then-Plan

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation


Prune-Then-Plan: Step-Level Calibration for Stable Frontier Exploration in Embodied Question Answering.

Noah Frahm, Prakrut Patel, Yue Zhang, Shoubin Yu, Mohit Bansal, Roni Sengupta,

Paper PDF Project Page


This is the official repository of Prune-Then-Plan: Step-Level Calibration for Stable Frontier Exploration in Embodied Question Answering.


News

  • [2025/11] Paper is on arXiv.

Installation

Set up the environment (Linux, Python 3.9):

conda create -n ptp python=3.9 -y && conda activate ptp

conda install -c conda-forge -y "numpy=1.26.3"
conda config --env --append pinned_packages "numpy=1.26.3"

pip install -c constraints.txt torch==2.2.2+cu121 torchvision==0.17.2+cu121 \
  --index-url https://download.pytorch.org/whl/cu121

conda install -c conda-forge -c aihabitat -y habitat-sim=0.2.5 headless faiss-cpu=1.7.4
conda install -y https://anaconda.org/pytorch3d/pytorch3d/0.7.8/download/linux-64/pytorch3d-0.7.8-py39_cu121_pyt222.tar.bz2

pip install -r requirements.txt -c constraints.txt

Run Evaluation

1 - Preparations

Dataset

Please download the train and val split of HM3D, and specify the path in cfg/eval_aeqa_template.yaml. For example, if your download path is /your_path/hm3d/ that contains /your_path/hm3d/train/ and /your_path/hm3d/val/, you can set the scene_data_path in the config file as /your_path/hm3d/.

The test questions of A-EQA are provided in the data/ folder. For A-EQA, we provide two subsets of different size: aeqa_questions-41.json, aeqa_questions-573.json, where aeqa_questions-573.json is the official AEQA subset provided by OpenEQA and the others are a smaller subset for quick evaluation.

2 - Run Evaluation on A-EQA

First run the following script to generate the predictions for the A-EQA dataset:

bash eval.sh --cf cfg/eval_aeqa_template.yaml

To split tasks, you can add --start_ratio and --end_ratio to specify the range of tasks to evaluate. For example, to evaluate the first half of the dataset, you can run:

bash eval.sh --cf cfg/eval_aeqa_template.yaml --start_ratio 0.0 --end_ratio 0.5

After the scripts finish, the results from all splits will be automatically aggregated and saved.

3 - Save Visualization

The default evaluation config will save visualization results including topdown maps, egocentric views, memory snapshots, and frontier snapshots at each step. Although saving visualization is very helpful, it may slow down the evaluation process. Please make save_visualization false if you would like to run large-scale evaluation without visuals.

Calibrate Your Own Model

1 - Generate Model Confidence Values

Download the calibration trajectories from the google drive link here and unzip it in the data directory. We provide our dataset of annotated frontiers in data/annotated_frontier_data.json.

In order to calibrate you will need to spin up a VLM. We currently support the following models and a helper script to do this:

  • Qwen2.5-VL 7B Instruct
  • Qwen2.5-VL 32B Instruct
  • Qwen3-VL 30B A3B Instruct

For example to spin up Qwen3-VL 30B on port 8000, run:

python -m models.Qwen.Qwen_server -m 3_30 --host localhost -p 8000

For additional argument information run:

python -m models.Qwen.Qwen_server -h

We use Qwen VL models for our evaluation. If you want to calibrate a different model, refer to the Qwen_VL_Server class and process_payload method in models/Qwen/Qwen_server.py for implementation details and to update model initialization and prompt formatting for other models.

To generate model confidence values and save the ECDF for your own VLM, run:

python -m src.calibration -cf cfg/offline-calibration.yaml

You can update the following settings in the config file (cfg/offline-calibration.yaml):

  • General settings
    • calibration_trajectories_dir — location of calibration trajectories (default: data/calibration_trajectories)
  • VLM settings
    • vlm_host_name — Inference server hostname/IP (default: localhost)
    • vlm_port — Inference server port (default: 8000)
  • ECDF save settings
    • output_dir — ECDF save directory (default: models/Holm-Bonferroni)
    • output_name — ECDF save name (default: Holm-Bonferroni-Offline)
    • noise — Adding noise to annotated data (default: 0.0)

Acknowledgement

The codebase is heavily built upon 3D-Mem, OpenEQA, Explore-EQA, and ConceptGraph. We thank the authors for their great work.

Citing Prune-Then-Plan

@misc{frahm2025prunethenplansteplevelcalibrationstable,
      title={Prune-Then-Plan: Step-Level Calibration for Stable Frontier Exploration in Embodied Question Answering}, 
      author={Noah Frahm and Prakrut Patel and Yue Zhang and Shoubin Yu and Mohit Bansal and Roni Sengupta},
      year={2025},
      eprint={2511.19768},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2511.19768}, 
}

About

Official implementation of Prune-Then-Plan: Step-Level Calibration for Stable Frontier Exploration in Embodied Question Answering

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors