DGFusion: Depth-Guided Sensor Fusion for Robust Semantic Perception

by Tim Broedermann, Christos Sakaridis, Luigi Piccinelli, Wim Abbeloos, and Luc Van Gool

🔔 News:

[2026-01-26] We are happy to announce that DGFusion was accepted in the IEEE Robotics and Automation Letters.

Overview

This repository contains the official code for DGFusion, a novel depth-guided multimodal fusion method for robust semantic perception. DGFusion enhances condition-aware fusion by integrating depth information, treating multimodal segmentation as a multi-task problem. It utilizes lidar measurements both as model inputs and as ground truth for learning depth, with an auxiliary depth head that learns depth-aware features. These features are encoded into spatially-varying local depth tokens, which, together with a global condition token, dynamically adapt sensor fusion to the spatially varying reliability of each sensor across the scene. Additionally, DGFusion introduces a robust loss for depth learning, addressing the challenges of sparse and noisy lidar inputs in adverse conditions. Our method achieves state-of-the-art panoptic and semantic segmentation performance on the challenging MUSES and DeLiVER datasets.

Installation

We use Python 3.9.18, PyTorch 2.3.1, and CUDA 11.8.
We use Detectron2-v0.6.
For complete installation instructions, please see INSTALL.md.

Prepare Datasets

DGFusion support two datasets: MUSES and DeLiVER. The datasets are assumed to exist in a directory specified by the environment variable DETECTRON2_DATASETS. Under this directory, detectron2 will look for datasets in the structure described below, if needed.

$DETECTRON2_DATASETS/
    muses/
    deliver/

You can set the location for builtin datasets by export DETECTRON2_DATASETS=/path/to/datasets. If left unset, the default is ./datasets relative to your current working directory.
For more details on how to prepare the datasets, please see detectron2's documentation.

MUSES dataset structure:

You need to dowload the following packages from the MUSES dataset:

RGB_Frame_Camera_trainvaltest
Panoptic_Annotations_trainval
Semantic_Annotations_trainval
Event_Camera_trainvaltest
Lidar_trainvaltest
Radar_trainvaltest
GNSS_trainvaltest

and place them in the following structure:

$DETECTRON2_DATASETS/
    muses/
        calib.json
        gt_panoptic/
        frame_camera/
        lidar/
        radar/
        event_camera/
        gnss/

DeLiVER dataset structure:

You can download the DeLiVER dataset from the following link and place it in the following structure:

$DETECTRON2_DATASETS/
    deliver/
        semantic/
        img/
        lidar/
        event/
        hha/
        depth/

Evaluation

We provide weights for DGFusion trained on MUSES and DeLiVER datasets.
With the flag MODEL.TEST.DEPTH_ON, you can chose weather the depth is predicted during testing time or not.
To evaluate a model's performance, use:

MUSES (on the validation set):

python train_net.py \

    --config-file configs/muses/swin/dgfusion_swin_tiny_bs8_180k_muses_clre.yaml \

    --eval-only MODEL.IS_TRAIN False MODEL.WEIGHTS <path-to-checkpoint> \

    DATASETS.TEST_PANOPTIC "('muses_panoptic_val',)" \

    MODEL.TEST.PANOPTIC_ON True MODEL.TEST.SEMANTIC_ON True \

    MODEL.TEST.DEPTH_ON False

Predict on the test set to upload to the MUSES benchmark for both semantic and panoptic segmentation:

python train_net.py \

    --config-file configs/muses/swin/dgfusion_swin_tiny_bs8_180k_muses_clre.yaml \

    ----inference-only MODEL.IS_TRAIN False MODEL.WEIGHTS <path-to-checkpoint> \

    OUTPUT_DIR output/dgfusion_swin_tiny_bs8_200k_deliver_clde \

    DATASETS.TEST_PANOPTIC "('muses_panoptic_test',)" \

    MODEL.TEST.PANOPTIC_ON True MODEL.TEST.SEMANTIC_ON True \

    MODEL.TEST.DEPTH_ON False

This will create folders under <OUTPUT_DIR>/inference for the semantic and panoptic predictions (e.g. output/dgfusion_swin_tiny_bs8_200k_deliver_clde/inference/...).

For the panoptic predictions, you can zip the labelIds folder under the panoptic folder and upload it to the MUSES benchmark.
For the semantic predictions, you can zip the labelTrainIds folder under the semantic folder and upload it to the MUSES benchmark.

For better visualization you can further set MODEL.TEST.SAVE_PREDICTIONS.CITYSCAPES_COLORS True to get additional folders with the predictions in the cityscapes colors.

DeLiVER on the test set:

python train_net.py \

    --config-file configs/deliver/swin/dgfusion_swin_tiny_bs8_200k_deliver_clde.yaml \

    --eval-only MODEL.IS_TRAIN False MODEL.WEIGHTS <path-to-checkpoint> \

    DATASETS.TEST_SEMANTIC "('deliver_semantic_test',)" \

    MODEL.TEST.DEPTH_ON False

Replace deliver_semantic_test with deliver_semantic_val to evaluate on the validation set.

Results

We provide the following results for the MUSES dataset, with the testing score from the official MUSES Benchmark

Method	Backbone	PQ-val	mIoU-val	PQ-test	mIoU-test	config	Checkpoint
DGFusion	Swin-T	58.88	79.72	61.03	79.49	config	model

We provide the following results for the DeLiVER dataset:

Method	Modalities	Backbone	mIoU-val	mIoU-test	config	Checkpoint
DGFusion	CLDE	Swin-T	66.51	56.71	config	model
DGFusion	CLE	Swin-T	56.64	51.55	config	model

Training

We followed the general setup of CAFuser and trained DGFusion using 4 NVIDIA TITAN RTX GPUs with 24GB memory each. However, we do not include the training code in this project.

Citation

If you find this project useful in your research, please consider citing:


@article{broedermann2026dgfusion,
  author={Br{\"o}dermann, Tim and Sakaridis, Christos and Piccinelli, Luigi and Abbeloos, Wim and Van Gool, Luc},
  journal={IEEE Robotics and Automation Letters}, 
  title={{DGF}usion: Depth-Guided Sensor Fusion for Robust Semantic Perception}, 
  year={2026},
  volume={},
  number={},
  pages={1-8},
  doi={10.1109/LRA.2026.3656789}
}

License:

This work is licensed under the Creative Commons Attribution-Non Commercial ShareAlike 4.0 International License. To view a copy of this license, visit Legal Code - Attribution-NonCommercial-ShareAlike 4.0 International - Creative Commons

Acknowledgments

This project is based on the following open-source projects. We thank their authors for making the source code publicly available.

This work is funded by Toyota Motor Europe via the research project TRACE-Zurich (Toyota Research on Automated Cars Europe).

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
configs		configs
dgfusion		dgfusion
tools		tools
.gitignore		.gitignore
DGFusion_teaser.png		DGFusion_teaser.png
INSTALL.md		INSTALL.md
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
test_net.py		test_net.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DGFusion: Depth-Guided Sensor Fusion for Robust Semantic Perception

Overview

Contents

Installation

Prepare Datasets

Evaluation

Results

Training

Citation

License:

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

DGFusion: Depth-Guided Sensor Fusion for Robust Semantic Perception

Overview

Contents

Installation

Prepare Datasets

Evaluation

Results

Training

Citation

License:

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages