ZhiYuan(Aaron) Feng   |   冯志远

I am currently working on something incredibly exciting as a Ph.D. candidate in Computer Science at the Institute for Advanced Study, Tsinghua University, where I’m fortunate to be advised by Prof. Baining Guo. I received my Bachelor’s degree in Computer Science from the Qian Xuesen Honors College at Xi’an Jiaotong University.

My primary research interests lie in embodied AI, spatial reasoning, and multimodal large language models (MLLMs), with an emphasis on vision-language(-action) systems for robotic perception, understanding, and decision-making. Along the way, I’ve also explored related topics such as reinforcement learning (RL), large language models (LLMs), and retrieval-augmented generation (RAG). I’m also excited about bridging lab research with real-world scenarios to build systems that can genuinely support and improve everyday life. More broadly, I’m an AGI believer—and I’m genuinely excited to witness the arrival of ASI.

profile photo
Selected Papers
From Human Videos to Robot Manipulation: A Survey on Scalable Vision-Language-Action Learning with Human-Centric Data
Zhiyuan Feng, Qixiu Li, Huizhi Liang, Rushuai Yang, Yichao Shen, Zhiying Du, Zhaowei Zhang, Yu Deng, Li Zhao, Hao Zhao, Zongqing Lu, Oier Mees, Marc Pollefeys, Jiaolong Yang, Baining Guo
Paper
TechRxiv 2026 🔥🔥🔥

We survey how to turn abundant human videos into actionable supervision for VLA models, organizing methods into four categories and outlining key challenges for reliable real-world transfer.

Discover, Learn, and Reinforce: Scaling Vision-Language-Action Pretraining with Diverse RL-Generated Trajectories
Rushuai Yang, Zhiyuan Feng, Tianxiang Zhang, Kaixin Wang, Chuheng Zhang, Li Zhao, Xiu Su, Yi Chen, Jiang Bian
Paper
arXiv 2025

We propose DLR, an information-theoretic multi-pattern RL framework that generates diverse, high-success manipulation trajectories for scalable VLA pretraining, improving transfer and showing better data-scaling than standard single-pattern RL.

Beyond human demonstrations: Diffusion-based reinforcement learning to generate data for vla training
Rushuai Yang, Hangxing Wei, Ran Zhang, Zhiyuan Feng, Xiaoyu Chen, Tong Li, Chuheng Zhang, Li Zhao, Jiang Bian, Xiu Su, Yi Chen
Paper
arXiv 2025

We propose a diffusion policy optimization method that uses RL to autonomously generate smooth, low-variance long-horizon manipulation trajectories, enabling VLA training that outperforms models trained on human or Gaussian RL demonstrations on LIBERO.

Scalable vision-language-action model pretraining for robotic manipulation with real-life human activity videos
Qixiu Li, Yu Deng, Yaobo Liang, Lin Luo, Lei Zhou, Chengtang Yao, Lingqi Zeng, Zhiyuan Feng, Huizhi Liang, Sicheng Xu, Yizhong Zhang, Xi Chen, Hao Chen, Lily Sun, Dong Chen, Jiaolong Yang, Baining Guo
Paper / Webpage / Code
ICRA 2026 🔥🔥🔥

We propose VITRA, a novel approach for pretraining Vision-Language-Action (VLA) models for robotic manipulation using large-scale, unscripted, real-world videos of human hand activities.

Seeing across views: Benchmarking spatial reasoning of vision-language models in robotic scenes
Zhiyuan Feng, Zhaolu Kang, Qijie Wang, Zhiying Du, Jiongrui Yan, Shubin Shi, Chengbo Yuan, Huizhi Liang, Yu Deng, Qixiu Li, Rushuai Yang, Arctanx An, Leqi Zheng, Weijie Wang, Shawn Chen, Sicheng Xu, Yaobo Liang, Jiaolong Yang, Baining Guo
Paper / Webpage / Code / Data
ICLR 2026 🔥🔥🔥

We propose MV-RoboBench, a benchmark for evaluating the multi-view spatial reasoning capabilities of VLMs in robotic manipulation.

HSSBench: Benchmarking Humanities and Social Sciences Ability for Multimodal Large Language Models
Zhaolu Kang, Junhao Gong, Jiaxu Yan, Wanke Xia, Yian Wang, Ziwen Wang, Huaxuan Ding, Zhuo Cheng, Wenhao Cao, Zhiyuan Feng, Siqi He, Shannan Yan, Junzhe Chen, Xiaomin He, Chaoya Jiang, Wei Ye, Kaidong Yu, Xuelong Li
Paper / Code
ICLR 2026

HSSBench is a multilingual (UN six languages) benchmark of 13,000+ expert-curated samples designed to evaluate MLLMs’ cross-disciplinary, concept-to-vision reasoning in Humanities and Social Sciences, revealing clear gaps even in state-of-the-art models.

MIRA: Medical Time Series Foundation Model for Real-World Health Data
Hao Li, Bowen Deng, Chang Xu, Zhiyuan Feng, Viktor Schlegel, Yu-Hao Huang, Yizheng Sun, Jingyuan Sun, Kailai Yang, Yiyao Yu, Jiang Bian
Paper
NIPS 2025

We introduce MIRA, a medical time-series foundation model pretrained on 454B+ time points that handles irregular sampling and missingness to deliver stronger zero-shot and fine-tuned forecasting across datasets and tasks.

TransDiff: Diffusion-Based Method for Manipulating Transparent Objects Using a Single RGB-D Image
Haoxiao Wang, Kaichen Zhou, Binrui Gu, Zhiyuan Feng, Weijie Wang, Peilin Sun, Yicheng Xiao, Jianhua Zhang, Hao Dong
Paper / Webpage
ICRA 2025

We propose TransDiff, a diffusion-based single-view RGB-D depth completion method for accurate grasping of transparent objects.

Deep evidential learning in diffusion convolutional recurrent neural network
Zhiyuan Feng, Kai Qi, Bin Shi, Hao Mei, Qinghua Zheng, Hua Wei
Paper
CIKM 2024 Workshop

We integrate evidential deep learning into DCRNN to provide sampling-free uncertainty quantification for spatiotemporal traffic forecasting, achieving improved predictive intervals measured by MIS.


Education Experience
Tsinghua University Beijing 2024 -
Ph.D of Computer Science
Institute for Advanced Study
Advisor: Baining Guo (IEEE&ACM Fellow)
Xi'an Jiaotong University Xi'an 2020 - 2024
B.E. of Computer Science
Qian Xuesen Honors College
Advisor: Bin Shi, Hua Wei (ASU), Qinghua Zheng (Academician of CAE)

Research Experience
Microsoft Research Asia Jul 2024 - Present
Role: Research Intern, working on Embodied AI.
Advisor: Yu Deng and Jiaolong Yang.
Microsoft Research Asia Dec 2023 - Jun 2024
Role: Research Intern, working on LLM for reasoning.
Advisor: Zheng Zhang.

Honors & Services

  • IEEE Student Member
  • Reviewer, IJCV (International Journal of Computer Vision)
  • Reviewer, ECCV (European Conference on Computer Vision) 2026
  • Reviewer, CVPR (Computer Vision and Pattern Recognition) 2026
  • Reviewer, ICLR (International Conference on Learning Representations) 2026
  • Reviewer, ICRA (IEEE International Conference on Robotics and Automation) 2026
  • Microsoft “Stars of Tomorrow” Internship 2024
  • Outstanding Bachelor Graduate 2024
  • China Mobile Outstanding Scholarship 2023
  • National Scholarship 2022
  • National Scholarship 2021
  • The 12th Asia and Pacific Informatics Olympiad (Gold Medal) 2018
  • National Olympiad in Informatics in Provinces (First Prize) 2017, 2018

  • This template is a modification to Jiayi Ni's website and Jon Barron's website.