|
Zongze Wu
I'm a research scientist/engineer at Adobe Research in San Francisco.
At Adobe, I work with the FireFly team on Structure Reference.
I got my PhD at Hebrew University of Jerusalem in 2022, under the supervision of Prof. Dani Lischinski and Eli Shechtman from Adobe Research.
I got my bachelor degree at Tongji University in 2016.
Email /
CV /
Scholar /
Twitter
|
|
|
Causality in Video Diffusers is Separable from Denoising
Xingjian Bai,
Guande He,
Zhengqi Li,
Eli Shechtman,
Xun Huang,
Zongze Wu
arXiv 2026
[project page]
[arXiv]
We show that causal reasoning in video diffusion models is separable from denoising, introducing Separable Causal Diffusion (SCD) that significantly improves throughput and per-frame latency.
|
|
What Matters for Representation Alignment: Global Information or Spatial Structure?
Jaskirat Singh,
Xingjian Leng,
Zongze Wu,
Liang Zheng,
Richard Zhang,
Eli Shechtman,
Saining Xie
arXiv 2025
[project page]
[arXiv]
Spatial structure, rather than global semantic performance, drives the generation quality of representation alignment in diffusion models.
|
|
Improved Mean Flows: On the Challenges of Fastforward Generative Models
Zhengyang Geng,
Yiyang Lu,
Zongze Wu,
Eli Shechtman,
J. Zico Kolter,
Kaiming He
arXiv 2025
[project page]
[arXiv]
Improved MeanFlow (iMF) achieves 1.72 FID with a single function evaluation on ImageNet 256x256, closing the gap with multi-step methods using no distillation.
|
|
VLM-Guided Adaptive Negative Prompting for Creative Generation
Shelly Golan,
Yotam Nitzan,
Zongze Wu,
Or Patashnik
arXiv 2025
[project page]
[arXiv]
A training-free, inference-time method that uses VLMs to adaptively steer diffusion away from conventional concepts, encouraging novel and creative image generation.
|
|
TurboFill: Adapting Few-step Text-to-image Model for Fast Image Inpainting
Liangbin Xie,
Daniil Pakhomov,
Zhonghao Wang,
Zongze Wu,
Ziyan Chen,
Yuqian Zhou,
Haitian Zheng,
Zhifei Zhang,
Zhe Lin,
Jiantao Zhou,
Chao Dong
CVPR 2025
[project page]
[arXiv]
A fast image inpainting method that achieves high-quality results in just 4 diffusion steps, outperforming multi-step methods that require over 50 steps.
|
|
SliderSpace: Decomposing the Visual Capabilities of Diffusion Models
Rohit Gandikota,
Zongze Wu,
Richard Zhang,
David Bau,
Eli Shechtman,
Nicholas Kolkin
ICCV 2025
[project page]
[arXiv]
Automatically decomposes a diffusion model's visual capabilities into controllable, interpretable directions using low-rank adaptors for concept decomposition and style exploration.
|
|
TurboEdit: Instant text-based image editing
Zongze Wu,
Nicholas Kolkin,
Jonathan Brandt,
Richard Zhang,
Eli Shechtman
ECCV 2024
[project page]
[arXiv]
[Video]
Users can upload an image, and edit the image with natural language. Each edit only takes half a second.
|
|
Lazy diffusion transformer for interactive image editing
Yotam Nitzan,
Zongze Wu,
Richard Zhang,
Eli Shechtman,
Daniel Cohen-Or,
Taesung Park,
Michaƫl Gharbi
ECCV 2024
[project page]
[arXiv]
Instead of generating the entire image, we only generate the mask region to facilitate fast inpaint task.
|
|
Third time's the charm? Image and video editing with StyleGAN3
Yuval Alaluf,
Or Patashnik,
Zongze Wu,
Asif Zamir,
Eli Shechtman,
Dani Lischinski,
Daniel Cohen-Or
AIM ECCVW 2022
[project page]
[arXiv]
We show StyleGAN3 can be trained with unaligned images, and its w/w+ spaces are more entangled than StyleGAN2.
|
|
StyleAlign: Analysis and applications of aligned StyleGAN models
Zongze Wu,
Yotam Nitzan,
Eli Shechtman,
Dani Lischinski
ICLR 2022 (Oral Presentation)
[project page]
[arXiv]
The child model's latent spaces are semantically aligned with its parent's, inheriting incredibly rich semantics.
|
|
StyleCLIP: Text-driven manipulation of StyleGAN imagery
Or Patashnik*,
Zongze Wu*,
Eli Shechtman,
Daniel Cohen-Or,
Dani Lischinski
ICCV 2021 (Oral Presentation)
[project page]
[arXiv]
[ICCV Video]
[Demo Video]
Text-based image editing through mapping CLIP space to StyleGAN latent space.
|
|
StyleSpace analysis: Disentangled controls for StyleGAN image generation
Zongze Wu*,
Dani Lischinski,
Eli Shechtman
CVPR 2021 (Oral Presentation)
[project page]
[arXiv]
[Video]
The space of channel-wise style parameters is significantly more disentangled than the other intermediate latent spaces in StyleGAN.
|
|
Fine-grained foreground retrieval via teacher-student learning
Zongze Wu*,
Dani Lischinski,
Eli Shechtman
WACV 2021
[arXiv]
[Video]
Retrieve foreground images that are semantically compatible with the background.
|
|