Skip to content

wookiekim/SOLACE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SOLACE: Improving Text-to-Image Generation with Intrinsic Self-Confidence Rewards

CVPR arXiv License: MIT

Official implementation of SOLACE (Self-cOnfidence reward for aLigning text-to-imAge models via ConfidencE optimization).

Abstract

SOLACE introduces intrinsic self-confidence rewards for improving text-to-image generation through reinforcement learning. Unlike prior methods that rely on external reward models (e.g., PickScore, ImageReward), SOLACE leverages the diffusion model's own denoising confidence as a training signal — requiring no additional models at training time. This approach can be used standalone or combined with external rewards for hybrid training.

Supported Models

Model Type Script
SD3.5-Medium / Large Image train_sd3_self.py
Flux.1-dev Image train_flux_self.py
SDXL Image train_sdxl_self.py
WAN 2.1 Video train_wan2_1_self.py

Training variants:

  • train_sd3_self.py — Self-confidence reward only
  • train_sd3_self_ext.py — Hybrid: self-confidence + external reward
  • train_sd3_self_positive.py — Positive-only self-confidence

Installation

git clone https://github.com/wookiekim/SOLACE.git
cd SOLACE
pip install -e .

Note: flash-attn is recommended but not auto-installed. Install separately:

pip install flash-attn --no-build-isolation

Training

SD3.5-Medium (8 GPUs)

bash scripts/single_node/grpo_self.sh
# or manually:
accelerate launch --config_file scripts/accelerate_configs/multi_gpu.yaml \
    --num_processes=8 --main_process_port 29501 \
    scripts/train_sd3_self.py --config config/solace.py:general_ocr_sd3_8gpu

SD3.5-Medium Hybrid (self + external reward)

accelerate launch --config_file scripts/accelerate_configs/multi_gpu.yaml \
    --num_processes=8 --main_process_port 29501 \
    scripts/train_sd3_self_ext.py --config config/solace.py:general_ocr_sd3_8gpu

Flux.1-dev (8 GPUs)

bash scripts/single_node/grpo_flux_self.sh

SDXL (8 GPUs)

bash scripts/single_node/grpo_self_sdxl.sh
# or specify GPU count:
bash scripts/single_node/grpo_self_sdxl.sh 4 sdxl_self_4gpu

WAN 2.1 Video (8 GPUs)

bash scripts/single_node/grpo_wan_self.sh

Reward Models

When using hybrid training (train_sd3_self_ext.py) or external reward evaluation, configure the reward function in the config:

config.reward_fn = {
    "ocr": 1.0,        # OCR accuracy reward
    # "pickscore": 1.0, # PickScore reward
    # "geneval": 1.0,   # GenEval reward
}

External reward models are loaded automatically. The OCR reward uses EasyOCR, while PickScore and ImageReward require their respective packages.

Dataset

Training prompts are provided in dataset/. Each subdirectory contains train.txt (training prompts) and test.txt (evaluation prompts). Specify the dataset path in the config:

config.dataset = os.path.join(os.getcwd(), "dataset/ocr")

Configuration

All configs are in config/solace.py. Key parameters:

Parameter Description
config.train.beta KL divergence loss weight
config.sample.num_image_per_prompt Group size for GRPO
config.sample.global_std Use global std for advantage normalization
config.train.ema Enable EMA for reference model
config.train.sds.k Number of antithetic probes (SDXL)
config.sample.noise_level Noise injection level (Flux)

Citation

@inproceedings{kim2026solace,
    title={Improving Text-to-Image Generation with Intrinsic Self-Confidence Rewards},
    author={Kim, Wookyoung and others},
    booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    year={2026}
}

Acknowledgments

This codebase is built upon Flow-GRPO by Jie Liu et al. We thank the authors for their beautiful open-source framework for applying GRPO to flow-matching diffusion models.

License

This project is licensed under the MIT License — see LICENSE for details.

About

SOLACE: Improving Text-to-Image Generation with Intrinsic Self-Confidence Rewards (CVPR 2026)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors