Zongze Wu

I'm a research scientist/engineer at Adobe Research in San Francisco.

At Adobe, I work with the FireFly team on Structure Reference.

I got my PhD at Hebrew University of Jerusalem in 2022, under the supervision of Prof. Dani Lischinski and Eli Shechtman from Adobe Research. I got my bachelor degree at Tongji University in 2016.

Email  /  CV  /  Scholar  /  Twitter

profile photo
Research

My research focuses on generative modeling, spanning diffusion models, GANs, and autoregressive models. I am particularly interested in training efficiency, simpler training frameworks, and efficient architecture design — recent work includes one-step generation via mean flows, representation alignment for diffusion, and separable causal inference in video diffusion.

SCD Causality in Video Diffusers is Separable from Denoising
Xingjian Bai, Guande He, Zhengqi Li, Eli Shechtman, Xun Huang, Zongze Wu
arXiv 2026
[project page] [arXiv]

We show that causal reasoning in video diffusion models is separable from denoising, introducing Separable Causal Diffusion (SCD) that significantly improves throughput and per-frame latency.

iREPA What Matters for Representation Alignment: Global Information or Spatial Structure?
Jaskirat Singh, Xingjian Leng, Zongze Wu, Liang Zheng, Richard Zhang, Eli Shechtman, Saining Xie
arXiv 2025
[project page] [arXiv]

Spatial structure, rather than global semantic performance, drives the generation quality of representation alignment in diffusion models.

iMF Improved Mean Flows: On the Challenges of Fastforward Generative Models
Zhengyang Geng, Yiyang Lu, Zongze Wu, Eli Shechtman, J. Zico Kolter, Kaiming He
arXiv 2025
[project page] [arXiv]

Improved MeanFlow (iMF) achieves 1.72 FID with a single function evaluation on ImageNet 256x256, closing the gap with multi-step methods using no distillation.

VLM Negative Prompting VLM-Guided Adaptive Negative Prompting for Creative Generation
Shelly Golan, Yotam Nitzan, Zongze Wu, Or Patashnik
arXiv 2025
[project page] [arXiv]

A training-free, inference-time method that uses VLMs to adaptively steer diffusion away from conventional concepts, encouraging novel and creative image generation.

TurboFill TurboFill: Adapting Few-step Text-to-image Model for Fast Image Inpainting
Liangbin Xie, Daniil Pakhomov, Zhonghao Wang, Zongze Wu, Ziyan Chen, Yuqian Zhou, Haitian Zheng, Zhifei Zhang, Zhe Lin, Jiantao Zhou, Chao Dong
CVPR 2025
[project page] [arXiv]

A fast image inpainting method that achieves high-quality results in just 4 diffusion steps, outperforming multi-step methods that require over 50 steps.

SliderSpace SliderSpace: Decomposing the Visual Capabilities of Diffusion Models
Rohit Gandikota, Zongze Wu, Richard Zhang, David Bau, Eli Shechtman, Nicholas Kolkin
ICCV 2025
[project page] [arXiv]

Automatically decomposes a diffusion model's visual capabilities into controllable, interpretable directions using low-rank adaptors for concept decomposition and style exploration.

TurboEdit TurboEdit: Instant text-based image editing
Zongze Wu, Nicholas Kolkin, Jonathan Brandt, Richard Zhang, Eli Shechtman
ECCV 2024
[project page] [arXiv] [Video]

Users can upload an image, and edit the image with natural language. Each edit only takes half a second.

Lazy Diffusion Lazy diffusion transformer for interactive image editing
Yotam Nitzan, Zongze Wu, Richard Zhang, Eli Shechtman, Daniel Cohen-Or, Taesung Park, Michaƫl Gharbi
ECCV 2024
[project page] [arXiv]

Instead of generating the entire image, we only generate the mask region to facilitate fast inpaint task.

StyleGAN3 Editing Third time's the charm? Image and video editing with StyleGAN3
Yuval Alaluf, Or Patashnik, Zongze Wu, Asif Zamir, Eli Shechtman, Dani Lischinski, Daniel Cohen-Or
AIM ECCVW 2022
[project page] [arXiv]

We show StyleGAN3 can be trained with unaligned images, and its w/w+ spaces are more entangled than StyleGAN2.

StyleAlign StyleAlign: Analysis and applications of aligned StyleGAN models
Zongze Wu, Yotam Nitzan, Eli Shechtman, Dani Lischinski
ICLR 2022 (Oral Presentation)
[project page] [arXiv]

The child model's latent spaces are semantically aligned with its parent's, inheriting incredibly rich semantics.

StyleCLIP StyleCLIP: Text-driven manipulation of StyleGAN imagery
Or Patashnik*, Zongze Wu*, Eli Shechtman, Daniel Cohen-Or, Dani Lischinski
ICCV 2021 (Oral Presentation)
[project page] [arXiv] [ICCV Video] [Demo Video]

Text-based image editing through mapping CLIP space to StyleGAN latent space.

StyleSpace StyleSpace analysis: Disentangled controls for StyleGAN image generation
Zongze Wu*, Dani Lischinski, Eli Shechtman
CVPR 2021 (Oral Presentation)
[project page] [arXiv] [Video]

The space of channel-wise style parameters is significantly more disentangled than the other intermediate latent spaces in StyleGAN.

Foreground Retrieval Fine-grained foreground retrieval via teacher-student learning
Zongze Wu*, Dani Lischinski, Eli Shechtman
WACV 2021
[arXiv] [Video]

Retrieve foreground images that are semantically compatible with the background.


webpage template from Jon Barron