Skip to content

evnkim/SVSM

Repository files navigation

SVSM: Scalable View Synthesis Model

CVPR 2026

Evan Kim*, Hyunwoo Ryu*, Thomas W. Mitchel, Vincent Sitzmann,

*Equal contribution



1. Preparation

Environment

conda create -n SVSM python=3.11
conda activate SVSM
pip install -r requirements.txt

As we used xformers memory_efficient_attention, you'll need GPU device compute capability > 8.0. Check your GPU compute capability in CUDA GPUs Page.

Data

Download the RealEstate10K dataset from this link, which is provided by pixelSplat, and unzip the zip file and put the data in YOUR_RAW_DATAPATH. Run the following command to preprocess the data into our format.

python process_data.py --base_path YOUR_RAW_DATAPATH --output_dir YOUR_PROCESSED_DATAPATH --mode ['train' or 'test']

Checkpoints

Coming soon!

2. Training

Before training, you need to follow the instructions here to generate the Wandb key file for logging and save it in the configs folder as api_keys.yaml. You can use the configs/api_keys_example.yaml as a template.

Training command:

torchrun --nproc_per_node 1 --nnodes 1 \
    train.py --config configs/re10k.yaml

This will use one GPU on one node. This defaults to a total batch size of 32.

3. Inference

torchrun --nproc_per_node 1 --nnodes 1 \
inference.py --config "configs/re10k.yaml" \
training.dataset_path = "./preprocessed_data/test/full_list.txt" \
training.batch_size_per_gpu = 4 \
training.target_has_input =  false \
training.num_views = 5 \
training.square_crop = true \
training.num_input_views = 2 \
training.num_target_views = 3 \
inference.if_inference = true \
inference.compute_metrics = true \
inference.render_video = true \
inference_out_dir = ./experiments/evaluation/test

We use ./data/evaluation_index_re10k.json to specify the input and target view indice. This json file is originally from pixelSplat.

After inference, the code will generate a html file in the inference_out_dir folder. You can open the html file to view the results.

4. Acknowledgement

This codebase builds upon LVSM and PRoPE. We sincerely appreciate the contributions of the original authors and thank them for sharing their outstanding work.

5. Citation

If you find this work useful in your research, please consider citing:

@inproceedings{kim2026svsm,
  title={Scaling View Synthesis Transformers},
  author={Evan Kim and Hyunwoo Ryu and Thomas W. Mitchel and Vincent Sitzmann},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2026}
}

About

Scalable View Synthesis Model [CVPR 2026]

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages