Mapnav: A novel memory representation via annotated semantic maps for vlm-based vision-and-language navigation (ACL 2025)
Repository for Mapnav: A novel memory representation via annotated semantic maps for vlm-based vision-and-language navigation (ACL 2025)
The code has been tested only with Python 3.8 on Ubuntu 20.04.
- Environments Setup
- Follow L3MVN to install Habitat-lab, Habitat-sim, rednet, torch and other independences.
- Install the LLaVA.
- Dataset
- Download Matterport3d scene dataset to the data path.
- Path
- Change the dataset path and habitat path in the config_utils.py
You can download the huggingface dataset to generate you QA pairs to train your own model using LLaVA-NeXT.
CUDA_VISIBLE_DEVICES=1 python r2rnav_benchmark.py --split val1 --eval 1 --auto_gpu_config 0 -n 1 --num_local_steps 10 --print_images 1 --model_dir model_path --exp_name nohis_rgb --eval_episodes 1839 --collect 0 --stop_th 300
You can look for the generation and annotation pipeline in the r2rnav_benchmark.py and huatu3.py.