Official implementation of SOLACE (Self-cOnfidence reward for aLigning text-to-imAge models via ConfidencE optimization).
SOLACE introduces intrinsic self-confidence rewards for improving text-to-image generation through reinforcement learning. Unlike prior methods that rely on external reward models (e.g., PickScore, ImageReward), SOLACE leverages the diffusion model's own denoising confidence as a training signal — requiring no additional models at training time. This approach can be used standalone or combined with external rewards for hybrid training.
| Model | Type | Script |
|---|---|---|
| SD3.5-Medium / Large | Image | train_sd3_self.py |
| Flux.1-dev | Image | train_flux_self.py |
| SDXL | Image | train_sdxl_self.py |
| WAN 2.1 | Video | train_wan2_1_self.py |
Training variants:
train_sd3_self.py— Self-confidence reward onlytrain_sd3_self_ext.py— Hybrid: self-confidence + external rewardtrain_sd3_self_positive.py— Positive-only self-confidence
git clone https://github.com/wookiekim/SOLACE.git
cd SOLACE
pip install -e .Note: flash-attn is recommended but not auto-installed. Install separately:
pip install flash-attn --no-build-isolationbash scripts/single_node/grpo_self.sh
# or manually:
accelerate launch --config_file scripts/accelerate_configs/multi_gpu.yaml \
--num_processes=8 --main_process_port 29501 \
scripts/train_sd3_self.py --config config/solace.py:general_ocr_sd3_8gpuaccelerate launch --config_file scripts/accelerate_configs/multi_gpu.yaml \
--num_processes=8 --main_process_port 29501 \
scripts/train_sd3_self_ext.py --config config/solace.py:general_ocr_sd3_8gpubash scripts/single_node/grpo_flux_self.shbash scripts/single_node/grpo_self_sdxl.sh
# or specify GPU count:
bash scripts/single_node/grpo_self_sdxl.sh 4 sdxl_self_4gpubash scripts/single_node/grpo_wan_self.shWhen using hybrid training (train_sd3_self_ext.py) or external reward evaluation, configure the reward function in the config:
config.reward_fn = {
"ocr": 1.0, # OCR accuracy reward
# "pickscore": 1.0, # PickScore reward
# "geneval": 1.0, # GenEval reward
}External reward models are loaded automatically. The OCR reward uses EasyOCR, while PickScore and ImageReward require their respective packages.
Training prompts are provided in dataset/. Each subdirectory contains train.txt (training prompts) and test.txt (evaluation prompts). Specify the dataset path in the config:
config.dataset = os.path.join(os.getcwd(), "dataset/ocr")All configs are in config/solace.py. Key parameters:
| Parameter | Description |
|---|---|
config.train.beta |
KL divergence loss weight |
config.sample.num_image_per_prompt |
Group size for GRPO |
config.sample.global_std |
Use global std for advantage normalization |
config.train.ema |
Enable EMA for reference model |
config.train.sds.k |
Number of antithetic probes (SDXL) |
config.sample.noise_level |
Noise injection level (Flux) |
@inproceedings{kim2026solace,
title={Improving Text-to-Image Generation with Intrinsic Self-Confidence Rewards},
author={Kim, Wookyoung and others},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2026}
}This codebase is built upon Flow-GRPO by Jie Liu et al. We thank the authors for their beautiful open-source framework for applying GRPO to flow-matching diffusion models.
This project is licensed under the MIT License — see LICENSE for details.