Skip to content

spmem/spmem

Repository files navigation

Video World Models with Long-term Spatial Memory

Project page | Paper | Data

Tong Wu*, Shuai Yang*, Ryan Po, Yinghao Xu, Ziwei Liu, Dahua Lin, Gordon Wetzstein

* Equal Contribution

📦 Install Environment:

conda create -n spmem python=3.10 -y
conda activate spmem

pip install -r requirements.txt
  • Install PyTorch3D:
pip install "git+https://github.com/facebookresearch/pytorch3d.git"
  • Depth-Anything-3 (submodule):
cd Depth-Anything-3
pip install -e .

🌎Dataset Preprocess:

  • We processed web videos (from Miradata) into ~80K video clips, and annotated the original videos with MegaSAM (images, depth, and camera poses).
  • The resulting dataset is available at ysmikey/spmem_megadata.
  • To further convert our data into the TSDF (dynamic/static separation) training format similar to datasets/train_data_example, please refer to tsdf/data_process.sh.

🤖Inference:

Download Weights:

Download required weights from [Qwen2.5-VL-7B-Instruct] and [spmem_ckpt].

  • Qwen2.5-VL-7B-Instructckpt/Qwen2.5-VL-7B-Instruct/
  • spmem_ckptckpt/spmem_ckpt/

Quick start:

Run the example:

bash run_demo.sh

Streaming Control:

Run streaming control demo:

bash run_stream.sh

🏋️Training:

We provide an example training script that uses the example training data format.

  • Script (8x GPU): bash train_example.sh
  • Example data: datasets/train_data_example
  • Example config: datasets/train_data_example_config

Run:

bash train_example.sh

✒️Citation

If you find our work helpful for your research, please consider giving a star ⭐ and citation 📝

@article{wu2025video,
  title={Video world models with long-term spatial memory},
  author={Wu, Tong and Yang, Shuai and Po, Ryan and Xu, Yinghao and Liu, Ziwei and Lin, Dahua and Wetzstein, Gordon},
  journal={arXiv preprint arXiv:2506.05284},
  year={2025}
}

About

[NeurIPS 2025] Video World Models with Long-term Spatial Memory

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages