Skip to content
View spmem's full-sized avatar

Block or report spmem

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
spmem/Readme.md

Video World Models with Long-term Spatial Memory

Project page | Paper | Data

Tong Wu*, Shuai Yang*, Ryan Po, Yinghao Xu, Ziwei Liu, Dahua Lin, Gordon Wetzstein

* Equal Contribution

📦 Install Environment:

conda create -n spmem python=3.10 -y
conda activate spmem

pip install -r requirements.txt
  • Install PyTorch3D:
pip install "git+https://github.com/facebookresearch/pytorch3d.git"
  • Depth-Anything-3 (submodule):
cd Depth-Anything-3
pip install -e .

🌎Dataset Preprocess:

  • We processed web videos (from Miradata) into ~80K video clips, and annotated the original videos with MegaSAM (images, depth, and camera poses).
  • The resulting dataset is available at ysmikey/spmem_megadata.
  • To further convert our data into the TSDF (dynamic/static separation) training format similar to datasets/train_data_example, please refer to tsdf/data_process.sh.

🤖Inference:

Download Weights:

Download required weights from [Qwen2.5-VL-7B-Instruct] and [spmem_ckpt].

  • Qwen2.5-VL-7B-Instructckpt/Qwen2.5-VL-7B-Instruct/
  • spmem_ckptckpt/spmem_ckpt/

Quick start:

Run the example:

bash run_demo.sh

Streaming Control:

Run streaming control demo:

bash run_stream.sh

🏋️Training:

We provide an example training script that uses the example training data format.

  • Script (8x GPU): bash train_example.sh
  • Example data: datasets/train_data_example
  • Example config: datasets/train_data_example_config

Run:

bash train_example.sh

✒️Citation

If you find our work helpful for your research, please consider giving a star ⭐ and citation 📝

@article{wu2025video,
  title={Video world models with long-term spatial memory},
  author={Wu, Tong and Yang, Shuai and Po, Ryan and Xu, Yinghao and Liu, Ziwei and Lin, Dahua and Wetzstein, Gordon},
  journal={arXiv preprint arXiv:2506.05284},
  year={2025}
}

Popular repositories Loading

  1. spmem spmem Public

    [NeurIPS 2025] Video World Models with Long-term Spatial Memory

    Python 23

  2. spmem.github.io spmem.github.io Public

    JavaScript