Skip to content

sijieaaa/UAVScenes

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

UAVScenes

(ICCV 2025) UAVScenes: A Multi-Modal Dataset for UAVs

[arXiv] [ICCV 2025]

We introduce UAVScenes, a large-scale dataset designed to benchmark various tasks across both 2D and 3D modalities. Our benchmark dataset is built upon the well-calibrated multi-modal UAV dataset MARS-LVIG, originally developed only for simultaneous localization and mapping (SLAM). We enhance this dataset by providing manually labeled semantic annotations for both images and LiDAR point clouds, along with accurate 6-degree-of-freedom (6-DoF) poses. These additions enable a wide range of UAV perception tasks, including detection, segmentation, depth estimation, 6-DoF localization, place recognition, and novel view synthesis (NVS). To the best of our knowledge, this is the first UAV benchmark dataset to offer both image and LiDAR point cloud semantic annotations (120k labeled pairs), with the potential to advance multi-modal UAV perception research.

pic

Download

We provide both the full dataset (interval=1) and the key-frame only dataset (interval=5, 1/5 size).
UAVScenes has been uploaded onto various cloud platforms.

⚠️ If you face any download problems, kindly please raise an issue with screenshots. We will fix them ASAP🙂.

We currently include:

  • Hikvision camera images with annotations
  • Livox Avia LiDAR point clouds with annotations
  • 6-DoF poses
  • Reconstructed 3D point cloud/mesh maps

File Information

interval1_CAM_LIDAR contains camera images, LiDAR point clouds, 6-DoF poses, and calibrations.
interval1_CAM_label contains camera semantic annotations.
interval1_LIDAR_label contains LiDAR semantic annotations.
terra_3dmap_pointcloud_mesh contains 3D mesh/point cloud maps.

cmap.py contains color-ID mapping.
calibration_results.py contains camera-LiDAR calibrations.
sampleinfos_interpolated.json contains camera-3D map calibrations.

terra_ply/ contains the raw mesh map outputs from Terra, which contains multiple mesh blocks.
cloud_merged.ply contains the raw point cloud map outputs from Terra.
Mesh.ply is built by merging all mesh blocks from terra_ply/ together.

Dataset Overview

pic

  • UAVScenes consists of 4 large scenes (AMtown, AMvalley, HKairport, and HKisland). Each scene consists of multiple runs (e.g., 01, 02, and 03).

pic

Baseline Code

Under preparing. Please stay tuned. You are also welcome to use your custom train/test split for all tasks.

Citation

@article{wang2025uavscenes,
  title={UAVScenes: A Multi-Modal Dataset for UAVs},
  author={Wang, Sijie and Li, Siqi and Zhang, Yawei and Yu, Shangshu and Yuan, Shenghai and She, Rui and Guo, Quanjiang and Zheng, JinXuan and Howe, Ong Kang and Chandra, Leonrich and others},
  journal={arXiv preprint arXiv:2507.22412},
  year={2025}
}

License

This work is available under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License and is meant for academic use only.

About

(ICCV 2025) UAVScenes: A Multi-Modal Dataset for UAVs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages