The MM5 dataset is a comprehensive multimodal dataset capturing RGB, Depth, Thermal (LWIR), Ultraviolet (UV), and Near-Infrared (NIR) images. It is designed for advanced multimodal research, providing diverse modalities, annotated data, and carefully calibrated and aligned images.
Make sure you download the latest version.
- 2025-04-23 - Initial version.
- 2025-06-23 - Updated:
- Added IAIP (Geometrically aligned intensity data created by using inpainted depth data – can include distortions)
- Fixed labels (removed underrepresented "sliced" classes)
- Fixed missing image issue in some processed data.
- 2025-08-03 - Updated:
- Added RAW IAIP (Geometrically aligned intensity data created by using inpainted depth data – can include distortions)
- Corrected class ID assignment to UV and thermal labels - now all IDs are aligned with the aligned dataset.
- 2026-04-12 - Updated:
- Added geometrically unaligned datasets (640x480px and 800x600px) including annotations.
figshare: https://doi.org/10.6084/m9.figshare.28722164
- Download Link for Raw Data (incl. raw annotations)
- Download Link for Aligned/Cropped Data(incl. aligned annotations)
- Download Link for Label Studio annotations (JSON & COCO exports)
- Download Link for MM5 Calibration Images
- Download Link for Unaligned/Cropped Data 640x480px (incl. annotations)
- Download Link for Unaligned/Cropped Data 800x600px (incl. annotations)
If you use this dataset, please cite our publication and the dataset:
Brenner, Martin; Reyes, Napoleon; Susnjak, Teo; Barczak, Andre (2025). MM5: Multimodal Image Dataset. figshare. Dataset. https://doi.org/10.6084/m9.figshare.28722164
@article{BRENNER2026103516,
title = {MM5: Multimodal image capture and dataset generation for RGB, depth, thermal, UV, and NIR},
journal = {Information Fusion},
volume = {126},
pages = {103516},
year = {2026},
issn = {1566-2535},
doi = {https://doi.org/10.1016/j.inffus.2025.103516},
url = {https://www.sciencedirect.com/science/article/pii/S1566253525005883},
author = {Martin Brenner and Napoleon H. Reyes and Teo Susnjak and Andre L.C. Barczak},
keywords = {Multimodal dataset, Thermal imaging, UV imaging, Preprocessing, Sensor fusion, Dataset annotation}
}
If you use GatedFusion-Net, please cite this publication:
Brenner, Martin and Reyes, Napoleon H. and Susnjak, Teo and Barczak, Andre L. C., GatedFusion-Net: Per-Pixel Modality Selection in a Five-Cue Transformer for RGB-D-I-T-UV Fusion (June 08, 2025). Available at SSRN: https://ssrn.com/abstract=5379135 or http://dx.doi.org/10.2139/ssrn.5379135
MM5: Multimodal Image Capture and Dataset Generation for RGB, Depth, Thermal, UV, and NIR
GatedFusion-Net: Per-Pixel Modality Selection in a Five-Cue Transformer for RGB-D-I-T-UV Fusion
Pre-Logit Decoder Fusion for Five-Modality Segmentation with Unaligned T/UV Auxiliaries (preprint)
This dataset is available under Creative Commons Attribution-NonCommercial 4.0.
The dataset consists of:
- RGB images captured under various lighting conditions.
- Raw and processed depth data (16-bit).
- Thermal images (16-bit raw, 24-bit static-colour encoded, and 8-bit dynamic range).
- Ultraviolet images.
- Near-infrared images (16-bit raw).
- Pixel-wise annotations for segmentation, instance segmentation, and classification both, for the aligned/cropped data as well as for the raw data.
The raw data is organised into separate folders corresponding to each camera modality. In our stereo setup, images from the left and right cameras are distinguished with suffixes _0 and _1, respectively. Each file follows a standardised naming convention including:
[sequence_number]_[settings_ID]_[timestamp]_[modality].png
Example: 1_5_20240716_130310_143_rgb.png
Raw Data Folder Structure:
DEPTH_0,DEPTH_1IR_0,IR_1LWIRMETARGB_0,RGB_1UVANNO_V,ANNO_T,ANNO_U
-
Depth and IR: Stored as 16-bit single-channel images. Available in two forms:
- Raw (
_raw) - Kinect SDK transformed (
_tr)
- Raw (
-
Thermal (LWIR): Available representations include:
- Raw 16-bit images (
_lwir16) - 24-bit fixed colour encoded images (
_lwir) - 8-bit grayscale with automatic gain control (AGC) (
_lwir8dyn)
- Raw 16-bit images (
Encoded LWIR images are provided for convenience and can be reproduced from raw data.
Aligned and cropped data have sequential filenames (starting from 1) ensuring cross-modal consistency. All modalities corresponding to a single capture share identical filenames.
Processed Data Folder Structure:
-
Annotations:
ANNO_CLASS,ANNO_INSTANNO_VIS_CLASS,ANNO_VIS_INST(colour-coded for visualisation)
-
Modalities:
- Depth:
D,D_Focus,D_FocusN,D_Focus960N,D16 - Infrared:
I,I16,IAIP - Metadata:
META - RGB (lighting settings):
RGB1–RGB8 - Thermal:
T8,T16,T24 - Ultraviolet:
U1,U8,U9
- Depth:
RGB and UV folder names correspond to their illumination settings, while Depth, IR, and Thermal folders indicate data encoding type.
MM5_RAW/
├── DEPTH_0/
├── DEPTH_1/
├── IR_0/
├── IR_1/
├── LWIR/
├── META/
├── RGB_0/
├── RGB_1/
└── UV/
MM5_ALIGNED/
├── ANNO_CLASS/
├── ANNO_INST/
├── ANNO_VIS_CLASS/
├── ANNO_VIS_INST/
├── D/
├── D_Focus/
├── D_FocusN/
├── D_Focus960N/
├── D_16
├── I
├── I16
├── IAIP
├── META
├── RGB1/ ... RGB8/
├── T8/
├── T16/
├── T24/
├── U1/
├── U8/
└── U9/
Annotations include class labels and instance(object) labels in pixel-level formats, available both visually (for quick reference) and as raw data for model training. Next to the labels for the aligned data, also the labels for the raw thermal and UV data are included.
To avoid redundant duplication, modalities such as depth, thermal, and IR, which do not vary across different RGB lighting conditions, are stored once. Researchers wishing to train across multiple or all lighting conditions must explicitly pair shared modalities with each RGB dataset. This can be efficiently managed by custom data loader scripts or restructuring the dataset to meet specific research requirements.
Raw data files follow this pattern:
[SequenceNumber]_[SettingID]_[Timestamp]_[Modality].png
Example:
1_5_20240716_130310_143_rgb.png
Processed data files follow a sequential naming convention:
[FrameNumber]_[Modality].png
RGB-D and Thermal Sensor Fusion: A Systematic Literature Review
M. Brenner, N. H. Reyes, T. Susnjak and A. L. C. Barczak, "RGB-D and Thermal Sensor Fusion: A Systematic Literature Review," in IEEE Access, vol. 11, pp. 82410-82442, 2023, doi: 10.1109/ACCESS.2023.3301119.
keywords: {Thermal sensors;Cameras;Three-dimensional displays;Sensor fusion;Robot sensing systems;Feature extraction;Systematics;Multimodal;RGB-D;RGB-DT;RGB-T;sensor fusion;thermal},

