| Xiaochuang Han | ![]() |
(You can call me Han, which is easier to pronounce and remember)
[CV] [Google Scholar]
Bio
I am a Research Scientist at Meta FAIR. My research is centered on multimodal generation, with specific interests in omni-multimodal generative architectures, long and interactive text-video generation, and synthetic data generation. I have developed frameworks such as TV2TV for interleaved language and video generation, LMFusion for adapting language models to multimodal tasks, and JPEG-LM for codec-based image generation. My previous work also includes diffusion language models, inference-time model collaboration, and training data attribution.
I earned my Ph.D. in Computer Science and Engineering from the University of Washington, where I was advised by Yulia Tsvetkov. Before UW, I received my M.S. from Carnegie Mellon University. I completed my undergraduate studies at Georgia Tech, where I was advised by Jacob Eisenstein. My research has been supported by the OpenAI Superalignment Fellowship (2024) and the Meta AI Mentorship Program (2023, 2022).
Selected Publications