Oscillation Inversion

Training-Free Image and Video Enhancement Through Oscillated Latents in Large Flow Models

AAAI 2026 Oral

Yan Zheng¹, Zhenxiao Liang¹, Xiaoyan Cong², Yi Yang³, Lanqing Guo¹, Yuehao Wang¹, Peihao Wang¹, Zhangyang Wang¹

¹University of Texas at Austin, ²Brown University, ³The University of Edinburgh

Abstract

We explore the oscillatory behavior observed in inversion methods applied to large-scale flow models, including text-to-image and text-to-video. By employing an augmented fixed-point-inspired iterative approach to invert real-world images, we observe that the solution does not achieve convergence, instead oscillating between distinct clusters. Through both experiments on synthetic data, text-to-image and text-to-video, we demonstrate that these oscillating clusters exhibit notable semantic coherence. We offer theoretical insights, showing that this behavior arises from oscillatory dynamics in flow models. Building on this understanding, we introduce a simple and fast distribution transfer technique that facilitates training-free image and video editing/enhancement. Furthermore, we provide quantitative results demonstrating the effectiveness of our method on tasks such as image enhancement, editing, and reconstruction. Notably, our approach enables the transformation of image-only enhancers and editors into lightweight, video-capable tools—without additional training—highlighting its practical versatility and impact.

Method Overview

Key idea: Fixed-point iteration in flow models causes oscillation between semantic clusters rather than convergence. We exploit this behavior through Group Inversion — simultaneously inverting a group of images to push outputs toward the high-quality data manifold.

Core algorithm (Oscillation Inversion):

z^{(k+1)}_{t_0} = y - (sigma_0 - sigma_{t_0}) * v_theta(z^{(k)}_{t_0}, sigma_{t_0})

Group Inversion:

z^{(k+1)}_{t_0} = y_{(k mod m)} - (sigma_0 - sigma_{t_0}) * v_theta(z^{(k)}_{t_0}, sigma_{t_0})

Repository Structure

Oscillation-Inversion/
├── src/                              # Core implementation
│   ├── flux_utils.py                 # Oscillation Inversion with FLUX (single target)
│   └── flux_utils_multi.py           # Group Inversion (multi-target)
│
├── diffusers_local/                  # Modified diffusers pipelines
│   ├── pipelines/flux/               # Custom FLUX pipeline
│   └── models/transformers/          # Custom transformer modules
│
├── scripts/
│   ├── run_oscillation_inversion.py  # Single-image oscillation inversion
│   ├── run_group_inversion.py        # Group inversion with multiple targets
│   ├── run_depth_align.py            # Depth-aligned inversion
│   ├── image_enhancement/            # Image enhancement experiments (Sec. 6.1)
│   │   ├── run_blur.py               # Deblurring
│   │   ├── run_noise.py              # Denoising
│   │   ├── run_downsample.py         # Super-resolution (4x)
│   │   ├── run_compress.py           # Compression artifact removal
│   │   ├── batch_*.sh                # Batch processing scripts
│   │   └── metric_*.py               # PSNR/LPIPS/FID evaluation
│   └── video_enhancement/            # Video enhancement experiments (Sec. 6.2)
│       ├── run_video.py              # Video inversion
│       ├── run_video_makeup.py       # Video makeup transfer
│       └── batch_*.sh                # Batch processing scripts
│
├── notebooks/
│   ├── theory_toy_example.ipynb      # Toy Gaussian theory visualization (Sec. 5)
│   ├── oscillation_analysis.ipynb    # Fixed-point oscillation analysis
│   └── image_editing_demo.ipynb      # Image editing/recoloring demo
│
├── demo/                             # Demo images
│   ├── glassgirl.png                 # Sample input image
│   ├── women/                        # Face enhancement demo
│   ├── makeup/                       # Makeup transfer demo
│   └── texture/                      # Texture synthesis demo
│
├── configs/                          # Configuration files
│   ├── config.py                     # Configuration dataclass
│   └── depth_align.yaml              # Depth alignment config
│
└── docs/                             # Project webpage
    ├── index.html
    └── data/                         # GIFs, PDFs for webpage

Installation

git clone https://github.com/VITA-Group/Oscillation-Inversion.git
cd Oscillation-Inversion
pip install -r requirements.txt

Requirements

Python >= 3.10
PyTorch >= 2.0 with CUDA support
NVIDIA GPU with >= 24GB VRAM (A6000 recommended)

Quick Start

1. Oscillation Inversion (Single Image)

python scripts/run_oscillation_inversion.py

This runs fixed-point iteration on a source-target image pair using FLUX.1-schnell, demonstrating the oscillation phenomenon between semantic clusters.

2. Group Inversion (Multi-Target Enhancement)

python scripts/run_group_inversion.py

This runs the augmented group inversion with multiple target images, enabling distribution transfer for image enhancement.

3. Image Enhancement on CelebA (Section 6.1)

# Process blurred CelebA images
cd scripts/image_enhancement
bash batch_blur.sh

# Compute metrics (PSNR, LPIPS)
python metric_blur.py

Available degradation types: blur, noise, downsample, compress

4. Video Enhancement (Section 6.2)

cd scripts/video_enhancement
bash batch_video.sh

5. Theory Visualization (Section 5)

Open notebooks/theory_toy_example.ipynb in Jupyter to reproduce the toy Gaussian mixture experiment demonstrating oscillation dynamics in rectified flow.

Models

This codebase uses the following pretrained models (automatically downloaded from HuggingFace):

Model	Usage
FLUX.1-schnell	Primary T2I model (4-step distilled)
FLUX.1-dev	Alternative T2I model
HunyuanVideo	T2V model for video enhancement

Results

Image Enhancement (CelebA, Table 1)

Method	Denoise PSNR	Denoise LPIPS	Deblur PSNR	Deblur LPIPS	4xSR PSNR	4xSR LPIPS
BlindDPS	-	-	23.56	0.257	21.82	0.345
Piscart	28.21	0.15	30.23	0.15	29.68	0.12
Ours	25.50	0.13	26.90	0.12	25.44	0.17

Video Enhancement (VFHQ, Table 2)

Method	T-LPIPS	CLIP_TSC
Baseline	0.0324	0.9823
Ours	0.0285	0.9847

Citation

@inproceedings{zheng2026oscillation,
  title={Oscillation Inversion: Training-Free Image and Video Enhancement Through Oscillated Latents in Large Flow Models},
  author={Zheng, Yan and Liang, Zhenxiao and Cong, Xiaoyan and Yang, Yi and Guo, Lanqing and Wang, Yuehao and Wang, Peihao and Wang, Zhangyang},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  year={2026}
}

Acknowledgements

This project builds upon FLUX, HunyuanVideo, and diffusers. We thank the authors for their excellent work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Oscillation Inversion

Training-Free Image and Video Enhancement Through Oscillated Latents in Large Flow Models

Abstract

Method Overview

Repository Structure

Installation

Requirements

Quick Start

1. Oscillation Inversion (Single Image)

2. Group Inversion (Multi-Target Enhancement)

3. Image Enhancement on CelebA (Section 6.1)

4. Video Enhancement (Section 6.2)

5. Theory Visualization (Section 5)

Models

Results

Image Enhancement (CelebA, Table 1)

Video Enhancement (VFHQ, Table 2)

Citation

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
configs		configs
demo		demo
diffusers_local		diffusers_local
docs		docs
notebooks		notebooks
scripts		scripts
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Oscillation Inversion

Training-Free Image and Video Enhancement Through Oscillated Latents in Large Flow Models

Abstract

Method Overview

Repository Structure

Installation

Requirements

Quick Start

1. Oscillation Inversion (Single Image)

2. Group Inversion (Multi-Target Enhancement)

3. Image Enhancement on CelebA (Section 6.1)

4. Video Enhancement (Section 6.2)

5. Theory Visualization (Section 5)

Models

Results

Image Enhancement (CelebA, Table 1)

Video Enhancement (VFHQ, Table 2)

Citation

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages