Boosting Latent Diffusion Models via Disentangled Representation Alignment

John Page^1* · Xuesong Niu^1* · Kai Wu^{1 †} · Kun Gai¹

¹ Kolors Team, Kuaishou Technology
_{^*Equal Contribution ^†Project Leads}

✨ Key Highlights of Send-VAE

Bridging the Representational Gap: As shown above, unlike previous direct alignment-based methods, our non-linear mapper effectively bridges the gap between the VAE’s local structures and the VFM’s dense semantics.
Semantic Disentanglement: Send-VAE facilitates the seamless injection of contextual knowledge while actively preserving the VAE’s native structured semantics, encouraging emergent disentanglement without requiring explicit regularization constraints.
Remarkable Acceleration: When integrated with flow-based transformers (SiTs), Send-VAE significantly accelerates the convergence of diffusion models, clearly outperforming both REPA and the enhanced REPA-E baselines.

Overview

What characteristics make VAE-based generation more “friendly”? To answer this question, we conduct experiments with three recently proposed evaluation methods for VAE latent space and show the results above. We reveal a surprising finding: the richness and accessibility of structured, fine-grained semantics is a more fundamental prerequisite for VAE latents than high-level semantic alignment. Building on this finding, we introduce Send-VAE, which leverages the rich representations of VFMs through a carefully designed non-linear mapping architecture.

Getting Started

1. Environment Setup

To set up our environment, please run:

git clone https://github.com/REPA-E/REPA-E.git
cd REPA-E
conda env create -f environment.yml -y
conda activate repa-e

2. Prepare the training data

Download and extract the training split of the ImageNet-1K dataset. Once it's ready, run the following command to preprocess the dataset:

python preprocessing.py --imagenet-path /PATH/TO/IMAGENET_TRAIN

Replace /PATH/TO/IMAGENET_TRAIN with the actual path to the extracted training images.

3. Train the Send-VAE model

Using the following script to train Send-VAE:

./train_sendvae.sh

You can adjust the following options:

--output-dir: Directory to save checkpoints and logs
--exp-name: Experiment name (a subfolder will be created under output-dir)
--vae: Choose between [f8d4, f16d32]
--vae-ckpt: Path to a provided or custom VAE checkpoint
--disc-pretrained-ckpt: Path to a provided or custom VAE discriminator checkpoint
--enc-type: [dinov2-vit-b, dinov2-vit-l, dinov2-vit-g, clip-vit-L, jepa-vit-h]

4. Use Send-VAE for Accelerated Training and Better Generation

Cache latents firstly for fast training:

./cache_latent.sh

Configure the VAE checkpoint path based on your setup. Then, train the latent diffusion models using:

./train_ldm.sh

You can adjust the following options:

--output-dir: Directory to save checkpoints and logs
--exp-name: Experiment name (a subfolder will be created under output-dir)
--vae: Choose between [f8d4, f16d32]
--vae-ckpt: Path to a provided or custom VAE checkpoint
--models: Choose from [SiT-B/2, SiT-L/2, SiT-XL/2, SiT-B/1, SiT-L/1, SiT-XL/1]. The number indicates the patch size. Select a model compatible with your VAE architecture.
--enc-type: [dinov2-vit-b, dinov2-vit-l, dinov2-vit-g, clip-vit-L, jepa-vit-h]
--encoder-depth: Any integer from 1 up to the full depth of the selected encoder
--proj-coeff: REPA-E projection coefficient for SiT alignment (float > 0)

5. Generate samples and run evaluation

Generate samples and save them as .npz files using the following script. Set the --exp-path and --train-steps based on your setup.

./sample.sh

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
configs		configs
images		images
loss		loss
models		models
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cache_latent.sh		cache_latent.sh
cache_latents.py		cache_latents.py
dataset.py		dataset.py
environment.yml		environment.yml
generate.py		generate.py
preprocess.py		preprocess.py
sample.sh		sample.sh
samplers.py		samplers.py
train_ldm.py		train_ldm.py
train_sendvae.py		train_sendvae.py
train_sendvae.sh		train_sendvae.sh
train_sit.sh		train_sit.sh
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Boosting Latent Diffusion Models via Disentangled Representation Alignment

Overview

Getting Started

1. Environment Setup

2. Prepare the training data

3. Train the Send-VAE model

4. Use Send-VAE for Accelerated Training and Better Generation

5. Generate samples and run evaluation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Boosting Latent Diffusion Models via Disentangled Representation Alignment

Overview

Getting Started

1. Environment Setup

2. Prepare the training data

3. Train the Send-VAE model

4. Use Send-VAE for Accelerated Training and Better Generation

5. Generate samples and run evaluation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages