DINO in the Room: Leveraging 2D Foundation Models for 3D Segmentation (DITR)

[Paper] [Project Page] [Weights] [Logs] [BibTeX]

📢 News

2026-03-16: Code release.
2025-11-05: Accepted to 3DV 2026.
2025-05-27: Winner of the CVPR 2025 ScanNet++ challenge.
2024-05-27: Accepted to CVPR 2025 T4V Workshop.

Setup

This code repository is originally based on Pointcept.

Dependencies

Tested with CUDA 12.4, Python 3.11 and the dependencies in the lock file. Should probably also work with newer versions.

# make sure to load CUDA 12.4 beforehand
uv sync --python 3.11

Data

Follow the instructions in the original README to setup the datasets. For ScanNet, the following command needs to be run afterwards as well.

python pointcept/datasets/preprocessing/scannet/prepare_2d_data/prepare_raw_data.py \
 --scannet_path $SCANNET_SCANS_PATH \
 --output_path data/processed/scannet_images

Train

Refer to the original README. The injection configs are generally named semseg-pt-v3m1-0-image and the distillation configs are named distill-pt-v3m1-0-distill. They can be found in the respective configs/$DATASET folder.

For example, to train the injection model on the nuscenes dataset with 2 GPUs, run the following command:

sh scripts/train.sh -g 2 -d nuscenes -n $EXPERIMENT_NAME -c semseg-pt-v3m1-0-image

For distillation, the following command can be used:

sh scripts/train.sh -g 2 -d nuscenes -n $EXPERIMENT_NAME -c distill-pt-v3m1-0-distill

For fine-tuning a distilled model, the following command can be used:

sh scripts/train.sh -g 2 -d nuscenes -n $EXPERIMENT_NAME -c semseg-pt-v3m1-0-base -o "weight=exp/nuscenes/$DISTILL_EXPERIMENT_NAME/model/model_last.pth"

Model Zoo

DITR (injected)

3D Backbone	Image Backbone	Dataset	Val mIoU	Exp Dir
PTv3	DINOv2 ViT-L	ScanNet	80.5	uploading...
PTv3	DINOv2 ViT-L	ScanNet200	41.2	link
PTv3	DINOv3 ViT-L	ScanNet200	42.3	link
PTv3	DINOv2 ViT-L	S3DIS	74.1	link
PTv3	DINOv2 ViT-S	nuScenes	82.8	link
PTv3	DINOv2 ViT-B	nuScenes	83.0	link
PTv3	DINOv2 ViT-L	nuScenes	83.1	link
PTv3	DINOv2 ViT-g	nuScenes	84.2	link
PTv3	DINOv3 ViT-L	nuScenes	83.9	uploading...
PTv3	DINOv2 ViT-L	SemanticKITTI	69.0	link

D-DITR (distilled)

3D Backbone	Datasets	Exp Dir
PTv3	ScanNet	link
PTv3	ScanNet + Structured3D	link

D-DITR (distilled + fine-tuned)

3D Backbone	Dataset	Val mIoU	Exp Dir
PTv3	ScanNet	79.2	link
PTv3	ScanNet200	37.7	link
PTv3	S3DIS	75.0	uploading...
MinkUNet	ScanNet	76.2	uploading...

Disclaimer

This software is a research prototype only and suitable only for test purposes. It has been published solely for use in research applications; it is not permitted to use this software in any kind of improper, disrespectful, defamatory, obscene, military or otherwise harmful application. This software is not suitable for use in or for products and/or services and in particular not in or for safety-relevant areas. It was solely developed for and published as part of the publication "DINO in the Room: Leveraging 2D Foundation Models for 3D Segmentation" and will neither be maintained nor monitored in any way.

Acknowledgment

The research and development of this software by RWTH Aachen has been supported by Robert Bosch GmbH under the project "Context Understanding for Autonomous Systems".

🎓 Citation

If you use our work in your research, please use the following BibTeX entry.

@InProceedings{knaebel2026ditr,
  title     = {{DINO} in the Room: Leveraging {2D} Foundation Models for {3D} Segmentation},
  author    = {Knaebel, Karim and Yilmaz, Kadir and de Geus, Daan and Hermans, Alexander and Adrian, David and Linder, Timm and Leibe, Bastian},
  booktitle = {2026 International Conference on 3D Vision (3DV)},
  year      = {2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 82 Commits
configs		configs
docs		docs
libs		libs
pointcept		pointcept
scripts		scripts
tools		tools
.gitignore		.gitignore
3rd-party-licenses.txt		3rd-party-licenses.txt
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DINO in the Room: Leveraging 2D Foundation Models for 3D Segmentation (DITR)

📢 News

Setup

Dependencies

Data

Train

Model Zoo

DITR (injected)

D-DITR (distilled)

D-DITR (distilled + fine-tuned)

Disclaimer

Acknowledgment

🎓 Citation

About

Uh oh!

Releases

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DINO in the Room: Leveraging 2D Foundation Models for 3D Segmentation (DITR)

📢 News

Setup

Dependencies

Data

Train

Model Zoo

DITR (injected)

D-DITR (distilled)

D-DITR (distilled + fine-tuned)

Disclaimer

Acknowledgment

🎓 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Contributors

Uh oh!

Languages