UAVScenes

(ICCV 2025) UAVScenes: A Multi-Modal Dataset for UAVs

[arXiv] [ICCV 2025]

We introduce UAVScenes, a large-scale dataset designed to benchmark various tasks across both 2D and 3D modalities. Our benchmark dataset is built upon the well-calibrated multi-modal UAV dataset MARS-LVIG, originally developed only for simultaneous localization and mapping (SLAM). We enhance this dataset by providing manually labeled semantic annotations for both images and LiDAR point clouds, along with accurate 6-degree-of-freedom (6-DoF) poses. These additions enable a wide range of UAV perception tasks, including detection, segmentation, depth estimation, 6-DoF localization, place recognition, and novel view synthesis (NVS). To the best of our knowledge, this is the first UAV benchmark dataset to offer both image and LiDAR point cloud semantic annotations (120k labeled pairs), with the potential to advance multi-modal UAV perception research.

Download

We provide both the full dataset (interval=1) and the key-frame only dataset (interval=5, 1/5 size).
UAVScenes has been uploaded onto various cloud platforms.

OneDrive
Google Drive
Baidu/百度网盘 (interval=5 only)
HuggingFace (interval=5 only)

⚠️ If you face any download problems, kindly please raise an issue with screenshots. We will fix them ASAP🙂.

We currently include:

Hikvision camera images with annotations
Livox Avia LiDAR point clouds with annotations
6-DoF poses
Reconstructed 3D point cloud/mesh maps

File Information

interval1_CAM_LIDAR contains camera images, LiDAR point clouds, 6-DoF poses, and calibrations.
interval1_CAM_label contains camera semantic annotations.
interval1_LIDAR_label contains LiDAR semantic annotations.
terra_3dmap_pointcloud_mesh contains 3D mesh/point cloud maps.

cmap.py contains color-ID mapping.
calibration_results.py contains camera-LiDAR calibrations.
sampleinfos_interpolated.json contains camera-3D map calibrations.

terra_ply/ contains the raw mesh map outputs from Terra, which contains multiple mesh blocks.
cloud_merged.ply contains the raw point cloud map outputs from Terra.
Mesh.ply is built by merging all mesh blocks from terra_ply/ together.

Dataset Overview

UAVScenes is built based on MARS-LVIG. Thanks for their excellent work.
We use X-AnyLabeling for 2D annotating, CloudCompare for 3D annotating, and DJI Terra (大疆智图) for 3D reconstruction (much more accurate than COLMAP).
More sensor and scene information can be found from MARS-LVIG.

UAVScenes consists of 4 large scenes (AMtown, AMvalley, HKairport, and HKisland). Each scene consists of multiple runs (e.g., 01, 02, and 03).

Baseline Code

Under preparing. Please stay tuned. You are also welcome to use your custom train/test split for all tasks.

Citation

@article{wang2025uavscenes,
  title={UAVScenes: A Multi-Modal Dataset for UAVs},
  author={Wang, Sijie and Li, Siqi and Zhang, Yawei and Yu, Shangshu and Yuan, Shenghai and She, Rui and Guo, Quanjiang and Zheng, JinXuan and Howe, Ong Kang and Chandra, Leonrich and others},
  journal={arXiv preprint arXiv:2507.22412},
  year={2025}
}

License

This work is available under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License and is meant for academic use only.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
pics		pics
LICENSE		LICENSE
README.md		README.md
calibration_results.py		calibration_results.py
cmap.py		cmap.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UAVScenes

Download

File Information

Dataset Overview

Baseline Code

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

UAVScenes

Download

File Information

Dataset Overview

Baseline Code

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages