Zongze Wu

I'm a research scientist/engineer at Adobe Research in San Francisco.

At Adobe, I work with the FireFly team on Structure Reference.

I got my PhD at Hebrew University of Jerusalem in 2022, under the supervision of Prof. Dani Lischinski and Eli Shechtman from Adobe Research. I got my bachelor degree at Tongji University in 2016.

Email / CV / Scholar / Twitter

Research

My research focuses on generative modeling, spanning diffusion models, GANs, and autoregressive models. I am particularly interested in training efficiency, simpler training frameworks, and efficient architecture design — recent work includes one-step generation via mean flows, representation alignment for diffusion, and separable causal inference in video diffusion.

	Causality in Video Diffusers is Separable from Denoising Xingjian Bai, Guande He, Zhengqi Li, Eli Shechtman, Xun Huang, Zongze Wu arXiv 2026 [project page] [arXiv] We show that causal reasoning in video diffusion models is separable from denoising, introducing Separable Causal Diffusion (SCD) that significantly improves throughput and per-frame latency.
	What Matters for Representation Alignment: Global Information or Spatial Structure? Jaskirat Singh, Xingjian Leng, Zongze Wu, Liang Zheng, Richard Zhang, Eli Shechtman, Saining Xie arXiv 2025 [project page] [arXiv] Spatial structure, rather than global semantic performance, drives the generation quality of representation alignment in diffusion models.
	Improved Mean Flows: On the Challenges of Fastforward Generative Models Zhengyang Geng, Yiyang Lu, Zongze Wu, Eli Shechtman, J. Zico Kolter, Kaiming He arXiv 2025 [project page] [arXiv] Improved MeanFlow (iMF) achieves 1.72 FID with a single function evaluation on ImageNet 256x256, closing the gap with multi-step methods using no distillation.
	VLM-Guided Adaptive Negative Prompting for Creative Generation Shelly Golan, Yotam Nitzan, Zongze Wu, Or Patashnik arXiv 2025 [project page] [arXiv] A training-free, inference-time method that uses VLMs to adaptively steer diffusion away from conventional concepts, encouraging novel and creative image generation.
	TurboFill: Adapting Few-step Text-to-image Model for Fast Image Inpainting Liangbin Xie, Daniil Pakhomov, Zhonghao Wang, Zongze Wu, Ziyan Chen, Yuqian Zhou, Haitian Zheng, Zhifei Zhang, Zhe Lin, Jiantao Zhou, Chao Dong CVPR 2025 [project page] [arXiv] A fast image inpainting method that achieves high-quality results in just 4 diffusion steps, outperforming multi-step methods that require over 50 steps.
	SliderSpace: Decomposing the Visual Capabilities of Diffusion Models Rohit Gandikota, Zongze Wu, Richard Zhang, David Bau, Eli Shechtman, Nicholas Kolkin ICCV 2025 [project page] [arXiv] Automatically decomposes a diffusion model's visual capabilities into controllable, interpretable directions using low-rank adaptors for concept decomposition and style exploration.
	TurboEdit: Instant text-based image editing Zongze Wu, Nicholas Kolkin, Jonathan Brandt, Richard Zhang, Eli Shechtman ECCV 2024 [project page] [arXiv] [Video] Users can upload an image, and edit the image with natural language. Each edit only takes half a second.
	Lazy diffusion transformer for interactive image editing Yotam Nitzan, Zongze Wu, Richard Zhang, Eli Shechtman, Daniel Cohen-Or, Taesung Park, Michaël Gharbi ECCV 2024 [project page] [arXiv] Instead of generating the entire image, we only generate the mask region to facilitate fast inpaint task.
	Third time's the charm? Image and video editing with StyleGAN3 Yuval Alaluf, Or Patashnik, Zongze Wu, Asif Zamir, Eli Shechtman, Dani Lischinski, Daniel Cohen-Or AIM ECCVW 2022 [project page] [arXiv] We show StyleGAN3 can be trained with unaligned images, and its w/w+ spaces are more entangled than StyleGAN2.
	StyleAlign: Analysis and applications of aligned StyleGAN models Zongze Wu, Yotam Nitzan, Eli Shechtman, Dani Lischinski ICLR 2022 (Oral Presentation) [project page] [arXiv] The child model's latent spaces are semantically aligned with its parent's, inheriting incredibly rich semantics.
	StyleCLIP: Text-driven manipulation of StyleGAN imagery Or Patashnik, Zongze Wu, Eli Shechtman, Daniel Cohen-Or, Dani Lischinski ICCV 2021 (Oral Presentation) [project page] [arXiv] [ICCV Video] [Demo Video] Text-based image editing through mapping CLIP space to StyleGAN latent space.
	StyleSpace analysis: Disentangled controls for StyleGAN image generation Zongze Wu, Dani Lischinski, Eli Shechtman CVPR 2021* (Oral Presentation) [project page] [arXiv] [Video] The space of channel-wise style parameters is significantly more disentangled than the other intermediate latent spaces in StyleGAN.
	Fine-grained foreground retrieval via teacher-student learning Zongze Wu, Dani Lischinski, Eli Shechtman WACV 2021* [arXiv] [Video] Retrieve foreground images that are semantically compatible with the background.

webpage template from Jon Barron