EgoCross Teaser

EgoCross

A comprehensive benchmark across Surgery, Industry, Extreme Sports, and Animal Perspective. EgoCross comprises 798 clips and 957 QA pairs, supporting both CloseQA and OpenQA formats for fine‑grained evaluation.

Cross‑Domain Egocentric Video QA

About the Dataset

EgoCross is a cross-domain benchmark designed to evaluate how well multimodal large language models (MLLMs) generalize to egocentric video question answering (VQA). Unlike prior daily-life datasets, EgoCross focuses on diverse and challenging domains — including surgery, industrial assembly, extreme sports, and animal perspective — to assess model robustness under varying visual and semantic conditions.

The benchmark covers 15 sub-tasks grouped into four capability families: Identification, Localization, Prediction, and Counting. Each video clip is paired with multiple close-ended and open-ended questions that require fine-grained temporal, spatial, and reasoning understanding.

In total, EgoCross contains 798 video clips and 957 QA pairs, curated through a semi-automatic pipeline combining LLM-based question generation and human verification. It provides a unified platform for measuring cross-domain generalization, highlighting the gap between everyday understanding and complex real-world egocentric perception.

Representative Domains

Surgery

Fine‑grained tool recognition, phase understanding, and hand‑specific interactions.

Industry

Component identification, procedural reasoning, and tool‑usage logic.

Extreme Sports

High‑speed egocentric motion, navigation cues, and temporal anticipation.

Animal Perspective

Species cues, alternative movement patterns, and behavioral understanding.

Download

EgoCross dataset is now available on Hugging Face. Access the complete benchmark with all domains and QA pairs.

Download from Hugging Face 🤗 Complete dataset with 798 clips and 957 QA pairs across 4 domains.

Resources

1st Cross-Domain EgoCross Challenge @ EgoVis, CVPR 2026

[News] 🔥 We are launching the 1st Cross-Domain EgoCross Challenge under the third EgoVis Workshop (CVPR 2026).

🏁 Source-Limited Track 🏁 Open-Source Track

Challenge Organizers

Yuqian Fu
Yuqian Fu
Tianwen Qian
Tianwen Qian
Yanjun Li
Yanjun Li
Yu Li
Yu Li
Kunyu Peng
Kunyu Peng
Xu Zheng
Xu Zheng
Yongqin Xian
Yongqin Xian
Alessio
Alessio Tonioni
Yanwei Fu
Yanwei Fu
Xiaoling Wang
Xiaoling Wang
Danda Paudel
Danda Paudel
Federico
Federico Tombari
Luc Van Gool
Luc Van Gool

Contact

For questions and collaboration, please reach out to our team members.

Yanjun Li
[email protected]
Yuqian Fu
[email protected]
Tianwen Qian
[email protected]