AAAI 2026 Oral
Yan Zheng1, Zhenxiao Liang1, Xiaoyan Cong2, Yi Yang3, Lanqing Guo1, Yuehao Wang1, Peihao Wang1, Zhangyang Wang1
1University of Texas at Austin, 2Brown University, 3The University of Edinburgh
We explore the oscillatory behavior observed in inversion methods applied to large-scale flow models, including text-to-image and text-to-video. By employing an augmented fixed-point-inspired iterative approach to invert real-world images, we observe that the solution does not achieve convergence, instead oscillating between distinct clusters. Through both experiments on synthetic data, text-to-image and text-to-video, we demonstrate that these oscillating clusters exhibit notable semantic coherence. We offer theoretical insights, showing that this behavior arises from oscillatory dynamics in flow models. Building on this understanding, we introduce a simple and fast distribution transfer technique that facilitates training-free image and video editing/enhancement. Furthermore, we provide quantitative results demonstrating the effectiveness of our method on tasks such as image enhancement, editing, and reconstruction. Notably, our approach enables the transformation of image-only enhancers and editors into lightweight, video-capable tools—without additional training—highlighting its practical versatility and impact.
Key idea: Fixed-point iteration in flow models causes oscillation between semantic clusters rather than convergence. We exploit this behavior through Group Inversion — simultaneously inverting a group of images to push outputs toward the high-quality data manifold.
Core algorithm (Oscillation Inversion):
z^{(k+1)}_{t_0} = y - (sigma_0 - sigma_{t_0}) * v_theta(z^{(k)}_{t_0}, sigma_{t_0})
Group Inversion:
z^{(k+1)}_{t_0} = y_{(k mod m)} - (sigma_0 - sigma_{t_0}) * v_theta(z^{(k)}_{t_0}, sigma_{t_0})
Oscillation-Inversion/
├── src/ # Core implementation
│ ├── flux_utils.py # Oscillation Inversion with FLUX (single target)
│ └── flux_utils_multi.py # Group Inversion (multi-target)
│
├── diffusers_local/ # Modified diffusers pipelines
│ ├── pipelines/flux/ # Custom FLUX pipeline
│ └── models/transformers/ # Custom transformer modules
│
├── scripts/
│ ├── run_oscillation_inversion.py # Single-image oscillation inversion
│ ├── run_group_inversion.py # Group inversion with multiple targets
│ ├── run_depth_align.py # Depth-aligned inversion
│ ├── image_enhancement/ # Image enhancement experiments (Sec. 6.1)
│ │ ├── run_blur.py # Deblurring
│ │ ├── run_noise.py # Denoising
│ │ ├── run_downsample.py # Super-resolution (4x)
│ │ ├── run_compress.py # Compression artifact removal
│ │ ├── batch_*.sh # Batch processing scripts
│ │ └── metric_*.py # PSNR/LPIPS/FID evaluation
│ └── video_enhancement/ # Video enhancement experiments (Sec. 6.2)
│ ├── run_video.py # Video inversion
│ ├── run_video_makeup.py # Video makeup transfer
│ └── batch_*.sh # Batch processing scripts
│
├── notebooks/
│ ├── theory_toy_example.ipynb # Toy Gaussian theory visualization (Sec. 5)
│ ├── oscillation_analysis.ipynb # Fixed-point oscillation analysis
│ └── image_editing_demo.ipynb # Image editing/recoloring demo
│
├── demo/ # Demo images
│ ├── glassgirl.png # Sample input image
│ ├── women/ # Face enhancement demo
│ ├── makeup/ # Makeup transfer demo
│ └── texture/ # Texture synthesis demo
│
├── configs/ # Configuration files
│ ├── config.py # Configuration dataclass
│ └── depth_align.yaml # Depth alignment config
│
└── docs/ # Project webpage
├── index.html
└── data/ # GIFs, PDFs for webpage
git clone https://github.com/VITA-Group/Oscillation-Inversion.git
cd Oscillation-Inversion
pip install -r requirements.txt- Python >= 3.10
- PyTorch >= 2.0 with CUDA support
- NVIDIA GPU with >= 24GB VRAM (A6000 recommended)
python scripts/run_oscillation_inversion.pyThis runs fixed-point iteration on a source-target image pair using FLUX.1-schnell, demonstrating the oscillation phenomenon between semantic clusters.
python scripts/run_group_inversion.pyThis runs the augmented group inversion with multiple target images, enabling distribution transfer for image enhancement.
# Process blurred CelebA images
cd scripts/image_enhancement
bash batch_blur.sh
# Compute metrics (PSNR, LPIPS)
python metric_blur.pyAvailable degradation types: blur, noise, downsample, compress
cd scripts/video_enhancement
bash batch_video.shOpen notebooks/theory_toy_example.ipynb in Jupyter to reproduce the toy Gaussian mixture experiment demonstrating oscillation dynamics in rectified flow.
This codebase uses the following pretrained models (automatically downloaded from HuggingFace):
| Model | Usage |
|---|---|
| FLUX.1-schnell | Primary T2I model (4-step distilled) |
| FLUX.1-dev | Alternative T2I model |
| HunyuanVideo | T2V model for video enhancement |
| Method | Denoise PSNR | Denoise LPIPS | Deblur PSNR | Deblur LPIPS | 4xSR PSNR | 4xSR LPIPS |
|---|---|---|---|---|---|---|
| BlindDPS | - | - | 23.56 | 0.257 | 21.82 | 0.345 |
| Piscart | 28.21 | 0.15 | 30.23 | 0.15 | 29.68 | 0.12 |
| Ours | 25.50 | 0.13 | 26.90 | 0.12 | 25.44 | 0.17 |
| Method | T-LPIPS | CLIP_TSC |
|---|---|---|
| Baseline | 0.0324 | 0.9823 |
| Ours | 0.0285 | 0.9847 |
@inproceedings{zheng2026oscillation,
title={Oscillation Inversion: Training-Free Image and Video Enhancement Through Oscillated Latents in Large Flow Models},
author={Zheng, Yan and Liang, Zhenxiao and Cong, Xiaoyan and Yang, Yi and Guo, Lanqing and Wang, Yuehao and Wang, Peihao and Wang, Zhangyang},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
year={2026}
}This project builds upon FLUX, HunyuanVideo, and diffusers. We thank the authors for their excellent work.
