SVSM: Scalable View Synthesis Model

CVPR 2026

Evan Kim^*, Hyunwoo Ryu^*, Thomas W. Mitchel, Vincent Sitzmann,

^*Equal contribution

1. Preparation

Environment

conda create -n SVSM python=3.11
conda activate SVSM
pip install -r requirements.txt

As we used xformers memory_efficient_attention, you'll need GPU device compute capability > 8.0. Check your GPU compute capability in CUDA GPUs Page.

Data

Download the RealEstate10K dataset from this link, which is provided by pixelSplat, and unzip the zip file and put the data in YOUR_RAW_DATAPATH. Run the following command to preprocess the data into our format.

python process_data.py --base_path YOUR_RAW_DATAPATH --output_dir YOUR_PROCESSED_DATAPATH --mode ['train' or 'test']

Checkpoints

Coming soon!

2. Training

Before training, you need to follow the instructions here to generate the Wandb key file for logging and save it in the configs folder as api_keys.yaml. You can use the configs/api_keys_example.yaml as a template.

Training command:

torchrun --nproc_per_node 1 --nnodes 1 \
    train.py --config configs/re10k.yaml

This will use one GPU on one node. This defaults to a total batch size of 32.

3. Inference

torchrun --nproc_per_node 1 --nnodes 1 \
inference.py --config "configs/re10k.yaml" \
training.dataset_path = "./preprocessed_data/test/full_list.txt" \
training.batch_size_per_gpu = 4 \
training.target_has_input =  false \
training.num_views = 5 \
training.square_crop = true \
training.num_input_views = 2 \
training.num_target_views = 3 \
inference.if_inference = true \
inference.compute_metrics = true \
inference.render_video = true \
inference_out_dir = ./experiments/evaluation/test

We use ./data/evaluation_index_re10k.json to specify the input and target view indice. This json file is originally from pixelSplat.

After inference, the code will generate a html file in the inference_out_dir folder. You can open the html file to view the results.

4. Acknowledgement

This codebase builds upon LVSM and PRoPE. We sincerely appreciate the contributions of the original authors and thank them for sharing their outstanding work.

5. Citation

If you find this work useful in your research, please consider citing:

@inproceedings{kim2026svsm,
  title={Scaling View Synthesis Transformers},
  author={Evan Kim and Hyunwoo Ryu and Thomas W. Mitchel and Vincent Sitzmann},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
configs		configs
data		data
model		model
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
generate_html.py		generate_html.py
inference.py		inference.py
process_data.py		process_data.py
pyrightconfig.json		pyrightconfig.json
requirements.txt		requirements.txt
setup.py		setup.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SVSM: Scalable View Synthesis Model

CVPR 2026

1. Preparation

Environment

Data

Checkpoints

2. Training

3. Inference

4. Acknowledgement

5. Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

SVSM: Scalable View Synthesis Model

CVPR 2026

1. Preparation

Environment

Data

Checkpoints

2. Training

3. Inference

4. Acknowledgement

5. Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages