πŸ‘‹ About Me

I am currently pursuing my Ph.D. at the Institute of Computing Technology , Chinese Academy of Sciences , advised by Prof. Zhaoqi Wang. Concurrently, I serve as a Research Intern at AMAP , Alibaba , where I work closely with Xiangxiang Chu. I am deeply grateful for the opportunity to collaborate with exceptional researchers including Prof. Shuo Li, Prof. Yujun Cai, and Prof. Yiwei Wang. Their mentorship and insights have profoundly shaped my academic journey.

My research interest includes Vision-Language Model (VLM), Large Language Model (LLM), Embodied Agents, Multimodal AI, and 3D Vision. I have published 20+ papers at the top international AI conferences such as NeurIPS, ICLR, ICML, CVPR, ICCV, AAAI, etc.

πŸ“š Research Interests

  • Foundation Models & Pre-training πŸ”₯πŸ”₯
    • Vision-Language Models (VLMs) / Vision-Language Action (VLA) / Spatial Intelligence
  • Model Enhancement & Post-training πŸ”₯πŸ”₯
    • Reasoning & Alignment / Tool-Augmented RL / NLP-Enhanced Training
  • Model Interpretation πŸ”₯πŸ”₯
    • Mechanistic Interpretability / Factuality, Truthfulness, and Social Good
  • Real-World ApplicationsπŸ”₯πŸ”₯
    • Embodied Agents / AI for Science / Biomedical Engineering

πŸ”₯ Main News

  • 2026.02: Β πŸŽ‰πŸŽ‰ Our work ADE-CoT has been Accepted by CVPR 2026.
  • 2026.01: Β πŸŽ‰πŸŽ‰ Our work Video-STAR has been Accepted by ICLR 2026.
  • 2026.01: Β πŸŽ‰πŸŽ‰ Our work AutoDrive-RΒ² has been Accepted by ICLR 2026.
  • 2025.11: Β πŸŽ‰πŸŽ‰ We propose ADE-CoT, which is now available on ArXiv!
  • 2025.11: Β πŸŽ‰πŸŽ‰ We propose Reasoning-VLA, which is now available on ArXiv!
  • 2025.10: Β πŸŽ‰πŸŽ‰ Our work DVP-MVS++ has been Accepted by TCSVT 2025.
  • 2025.10: Β πŸŽ‰πŸŽ‰ We propose Video-STAR, which is now available on ArXiv!
  • 2025.08: Β πŸŽ‰πŸŽ‰ Our work AutoDrive-RΒ² was reported by AutoDrive Heart (θ‡ͺεŠ¨ι©Ύι©ΆδΉ‹εΏƒ)
  • 2025.08: Β πŸŽ‰πŸŽ‰ We propose AutoDrive-RΒ², which is now available on ArXiv!
  • 2025.06: Β πŸŽ‰πŸŽ‰ We propose DVP-MVS++, which is now available on ArXiv!
  • 2025.05: Β πŸŽ‰πŸŽ‰ Our work SED-MVS has been Accepted by TCSVT 2025.
  • 2024.12: Β πŸŽ‰πŸŽ‰ We propose SED-MVS, which is now available on ArXiv!
  • 2024.12: Β πŸŽ‰πŸŽ‰ Our work DVP-MVS has been Accepted by AAAI 2025.
  • 2024.12: Β πŸŽ‰πŸŽ‰ Our work MSP-MVS has been Accepted by AAAI 2025.
  • 2024.08: Β πŸŽ‰πŸŽ‰ We propose DVP-MVS, which is now available on ArXiv!
  • 2024.08: Β πŸŽ‰πŸŽ‰ We propose MSP-MVS, which is now available on ArXiv!
  • 2024.05: Β πŸŽ‰πŸŽ‰ Our work TSAR-MVS has been Accepted by PR 2024.
  • 2024.01: Β πŸŽ‰πŸŽ‰ We propose TSAR-MVS, which is now available on ArXiv!
  • 2023.12: Β πŸŽ‰πŸŽ‰ Our work SD-MVS has been Accepted by AAAI 2024.
  • 2023.09: Β πŸŽ‰πŸŽ‰ We propose SD-MVS, which is now available on ArXiv!

πŸ“ Main Publications

Multimodal LLMs Post-Training

ICLR
sym

Video-STAR: Reinforcing Zero-shot Video Understanding with Tools

Think with Videos Tool-Using Agent Multi-turn Agentic RL

Yuan Z., Qu X., Qian, C., Chen, R., Tang, J., Sun L., Chu X., Zhang D., Wang Y., Cai Y., Li S.

[Paper]

ICLR
sym

AutoDrive-RΒ²: Incentivizing Reasoning and Self-Reflection Capacity for VLA Model in Autonomous Driving

Multimodal Reasoning Autonomous Driving Open-World Applications

Featured by AutoDrive Heart (θ‡ͺεŠ¨ι©Ύι©ΆδΉ‹εΏƒ)

Yuan Z., Tang, J., Luo, J., Chen, R., Qian, C., Sun, L., Cai Y., Zhang D., Li, S

[Paper]

Preprint
sym

Reasoning-VLA: A Fast and General Vision-Language-Action Reasoning Model for Autonomous Driving

Multimodal Reasoning Autonomous Driving Open-World Applications

Zhang D.*, Yuan Z.*, Chen Z., Liao C., Chen Y., Shen F., Zhou Q., Chua T.

[Paper]

Generative Foundation Model

Preprint
sym

From Scale to Speed: Adaptive Test-Time Scaling for Image Editing

Generation Model Image Editing Text-to-Image Generation

Qu X.*, Yuan Z.*, Tang J., Chen R., Tang D., Yu M., Sun L., Bai Y., Chu X., Gou G,., Xiong G., Cai Y.

Preprint
sym

Recovering Degradations with Generative Model: A Consistency-aware Distillation Network for Infrared and Visible Image Fusion

Generation Model Image Editing Text-to-Image Generation

Yu H.*, Yuan Z.*, Bai Y., Li J., Liu J., Li S., Sun L., Chu X.

3D Vision

TCSVT
sym

DVP-MVS++: Synergize Depth-Normal-Edge and Harmonized Visibility Prior for Multi-View Stereo

Yuan Z., Zhang, D., Li, Z., Qian, C., Chen, J., Chen, Y., Chen K., Mao T., Li Z, Jiang H., Wang, Z

IEEE Transactions on Circuits and Systems for Video Technology (IEEE TCSVT) (Under Review), 2025.

[Paper]

TCSVT
sym

SED-MVS: Segmentation-Driven and Edge-Aligned Deformation Multi-View Stereo with Depth Restoration and Occlusion Constraint

Yuan Z.., Yang, Z., Cai, Y., Wu, K., Liu, M., Zhang, D., Jiang H, Li Z., Wang, Z.

IEEE Transactions on Circuits and Systems for Video Technology (IEEE TCSVT), 2025.

[Paper]

AAAI
sym

DVP-MVS: Synergize Depth-Edge and Visibility Prior for Multi-View Stereo

Yuan Z.., Luo, J., Shen, F., Li, Z., Liu, C., Mao, T., Wang, Z.

AAAI Conference on Artificial Intelligence (AAAI), 2025.

[Paper] [Code]

AAAI
sym

MSP-MVS: Multi-granularity segmentation prior guided multi-view stereo

Yuan Z., Liu, C., Shen, F., Li, Z., Luo, J., Mao, T., Wang, Z.

AAAI Conference on Artificial Intelligence (AAAI), 2025.

[Paper] [Code]

AAAI
sym

SD-MVS: Segmentation-driven deformation multi-view stereo with spherical refinement and em optimization

Yuan Z., Cao, J., Li, Z., Jiang, H., Wang, Z.

AAAI Conference on Artificial Intelligence (AAAI), 2024.

[Paper] [Code]

PR
sym

TSAR-MVS: Textureless-aware segmentation and correlative refinement guided multi-view stereo

Yuan Z., Cao, J., Wang, Z., Li, Z..

Pattern Recognition (PR), 2024.

[Paper] [Code]

πŸ“– All Publications

πŸ† Service

  • Conference Reviewers: NeurIPS, ICML, ICLR, CVPR, ICCV, ECCV, AAAI
  • Journal Reviewers: IJCV, TIP, TMM, TNNLS, TCSVT, PR

Visitor Statistics