I am a Research Scientist at ByteDance working on GenAI. Previously, I earned my Ph.D. in Computer Science from Johns Hopkins University, advised by Bloomberg Distinguished Professor Alan L. Yuille. I have a board research experience in vairent computer vision / artificaial intellegence area, including but not limit to Video Generation 1 2 3, 3D vision 4 5 6, Robust Vision 7 8, Differible Rendering 9, and Medical Image Diagnosis 10 11.
I am currently focused on the large-scale post-training and fine-tuning of next-generation video models, including Seedance 2.0/1.0 and Wan 2.1/2.2. My work bridges the gap between foundational research and scalable implementation through three core pillars:
Controllable Synthesis: Developing high-quality, temporally consistent video generation models with a focus on precise user control.
World Modeling via Long Video Generation: Exploring agentic storytelling and continuous long-video generation to move toward robust, large-scale world modeling.
Architectural Optimization: Balancing the trade-offs between computational efficiency and generative quality by advancing foundational model architectures.
Autonomous Data Ecosystems: Scaling training datasets and labeling systems by leveraging agentic AI to automate and optimize the end-to-end data processing pipeline.
arXiv Preprint
arXiv Preprint
arXiv Preprint
arXiv Preprint
NIPS
arXiv Preprint
AAAI
WACV
ICLR
NIPS
ICLR
CVPR
CVPR
ICLR
ICLR
WACV
ICCV
WACV
ECCV
ECCV
IJCV
MICCAI
ECCV
arXiv Preprint
arXiv Preprint
arXiv Preprint
arXiv Preprint
Powered by Jekyll and Minimal Light theme.