I am a final-year Ph.D. student in Computer Science at the GenAI Center of Excellence, KAUST, advised by Prof. Bernard Ghanem. Prior to that, I obtained my master degree from Shanghai Jiao Tong University, and bachelor degree from Xi’an JiaoTong University.

My recent work focuses on advancing large video-language models, including VideoAuto-R1, an adaptive auto-thinking model, and BOLT, an effective frame-selection approach for long-form videos. I have also proposed a series of efficient, scalable temporal action detection methods (AdaTAD, ETAD, CausalTAD) and developed OpenTAD, the largest open-source framework for temporal action detection.

My research interests include:

Adaptive Video Reasoning: RL-based auto-thinking, agentic video reasoning
Efficient Video-Language Models: low-cost adaptation and inference-time efficiency for long videos.
End-To-End Temporal Grounding: precise timestamp localization and action detection at scale

📢 I am actively seeking full-time research positions. Feel free to reach out if you have any opportunities: shuming.liu[at]kaust.edu.sa

🔥 News

2026.02: 🔥 VideoAuto-R1 has been accepted by CVPR 2026!
2026.01: We release the VideoAuto-R1, an adaptive auto-thinking model for video understanding.
2025.06: I start my Internship at Meta as Research Scientist Intern.
2025.05: I am awarded the Dean’s List Award of KAUST for 2025.
2025.04: OpenTAD is accepted to CVPR Workshop 2025.
2025.02: BOLT is accepted by CVPR 2025.
2024.07: ColorMAE is accepted by ECCV 2024.
2024.06: We win 4 championships in the CVPR 2024 Challenges, including Action Recognition, Action Detection, and Audio-Based Interaction Detection, and Moment Queries!
2024.06: I am awarded the Dean’s List Award of KAUST for 2024.
2024.05: We release the OpenTAD, which is currently the largest codebase that supports 10+ SOTA TAD methods.
2024.02: AdaTAD and Dr2Net are accepted by CVPR 2024.
2024.01: DenoiseLoc is accepted by ICLR 2024.
2023.02: Re2TAL One paper is accepted by CVPR 2023 and ETAD is accepted by CVPRW 2023.

📝 Publications

VideoAuto-R1: Video Auto Reasoning via Thinking Once, Answering Twice

Shuming Liu, Bernard Ghanem, Vikas Chandra, Yunyang Xiong, et al.

CVPR 2026, [Project Page] [Code]

BOLT: Boost Large Vision-Language Model Without Training for Long-form Video Understanding

Shuming Liu, Chen Zhao, Tianqi Xu, Bernard Ghanem

CVPR 2025, [Code]

OpenTAD: A Unified Framework and Comprehensive Study of Temporal Action Detection

Shuming Liu, Chen Zhao, Bernard Ghanem, et al.

CVPRW 2025, [Code]

Harnessing Temporal Causality for Advanced Temporal Action Detection

Shuming Liu, Lin Sui, Chen-Lin Zhang, Fangzhou Mu, Chen Zhao, Bernard Ghanem

Technical Report, [Code]

End-to-End Temporal Action Detection with 1B Parameters Across 1000 Frames

Shuming Liu, Chen-Lin Zhang, Chen Zhao, Bernard Ghanem

CVPR 2024, [Code]

ETAD: Training Action Detection End to End on a Laptop

Shuming Liu, Mengmeng Xu, Chen Zhao, Xu Zhao, Bernard Ghanem

CVPRW 2023, [Code]

CVPR 2026 Mixture of States: Routing Token-Level Dynamics for Multimodal Generation
Haozhe Liu, Ding Liu, Mingchen Zhuge, Zijian Zhou, Tian Xie, Sen He, Yukang Yang, Shuming Liu, Yuren Cong, Jiadong Guo, Hongyu Xu, Ke Xu, Kam-Woh Ng, Juan C Pérez, Tao Xiang, Wei Liu, Shikun Liu, Jürgen Schmidhuber
Arxiv 2025 TimeLoc: A Unified End-to-End Framework for Precise Timestamp Localization in Long Videos
Chen-Lin Zhang*, Lin Sui*, Shuming Liu*, Fangzhou Mu*, Zhangcheng Wang, Bernard Ghanem
ECCV 2024 ColorMAE: Exploring data-independent masking strategies in Masked AutoEncoders
Carlos Hinojosa, Shuming Liu, Bernard Ghanem [Code]
CVPR 2024 Dr2Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning
Chen Zhao, Shuming Liu, Karttikeya Mangalam, Guocheng Qian, Fatimah Zohra, Abdulmohsen Alghannam, Jitendra Malik, Bernard Ghanem [Code]
CVPRW 2024 Look, Listen, and Attack: Backdoor Attacks Against Video Action Recognition
Hasan Hammoud, Shuming Liu, Mohammed Alkhrashi, Fahad AlBalawi, Bernard Ghanem
ICLR 2024 Boundary-Denoising for Video Activity Localization
Mengmeng Xu, Mattia Soldan, Jialin Gao, Shuming Liu, Juan-Manuel Perez-Rua, Bernard Ghanem [Code]
CVPR 2023 Re2TAL: Rewiring Pretrained Video Backbones for Reversible Temporal Action Localization
Chen Zhao, Shuming Liu, Karttikeya Mangalam, Bernard Ghanem [Code]
TMM 2020 Transferable Knowledge Based Multi-Granularity Fusion Network for Weakly Supervised Temporal Action Detection
Haisheng Su, Xu Zhao, Tianwei Lin, Shuming Liu, Zhilan Hu

📖 Educations

2021.09 - now, Ph.D., King Abdullah University of Science and Technology (KAUST), Saudi Arabia.
2018.09 - 2021.04, Master, Shanghai Jiao Tong University (SJTU), China.
2014.09 - 2018.06, Bachelor, Xi’an JiaoTong University (XJTU), China.

🎖 Honors and Awards

2025.05 Dean’s List Award of KAUST (20%)
2024.06 Dean’s List Award of KAUST (20%)
2021.03 Outstanding Graduate of SJTU
2019.12 Scholarship of SJTU (5%)
2018.06 Outstanding Undergraduate of XJTU
2017.12 Scholarship of XJTU (5%)

💻 Service

Conference Reviewer: CVPR, ICCV, ECCV, ICLR, ICML, NeurIPS, AAAI, WACV, BMVC

Journal Reviewer: TPAMI, IJCV, TIP, TMM, Neurocomputing

Teaching Assistant: Introduction to Computer Vision (KAUST), Computer Vision (SJTU)