SOLACE: Improving Text-to-Image Generation with Intrinsic Self-Confidence Rewards

Official implementation of SOLACE (Self-cOnfidence reward for aLigning text-to-imAge models via ConfidencE optimization).

Abstract

SOLACE introduces intrinsic self-confidence rewards for improving text-to-image generation through reinforcement learning. Unlike prior methods that rely on external reward models (e.g., PickScore, ImageReward), SOLACE leverages the diffusion model's own denoising confidence as a training signal — requiring no additional models at training time. This approach can be used standalone or combined with external rewards for hybrid training.

Supported Models

Model	Type	Script
SD3.5-Medium / Large	Image	`train_sd3_self.py`
Flux.1-dev	Image	`train_flux_self.py`
SDXL	Image	`train_sdxl_self.py`
WAN 2.1	Video	`train_wan2_1_self.py`

Training variants:

train_sd3_self.py — Self-confidence reward only
train_sd3_self_ext.py — Hybrid: self-confidence + external reward
train_sd3_self_positive.py — Positive-only self-confidence

Installation

git clone https://github.com/wookiekim/SOLACE.git
cd SOLACE
pip install -e .

Note: flash-attn is recommended but not auto-installed. Install separately:

pip install flash-attn --no-build-isolation

Training

SD3.5-Medium (8 GPUs)

bash scripts/single_node/grpo_self.sh
# or manually:
accelerate launch --config_file scripts/accelerate_configs/multi_gpu.yaml \
    --num_processes=8 --main_process_port 29501 \
    scripts/train_sd3_self.py --config config/solace.py:general_ocr_sd3_8gpu

SD3.5-Medium Hybrid (self + external reward)

accelerate launch --config_file scripts/accelerate_configs/multi_gpu.yaml \
    --num_processes=8 --main_process_port 29501 \
    scripts/train_sd3_self_ext.py --config config/solace.py:general_ocr_sd3_8gpu

Flux.1-dev (8 GPUs)

bash scripts/single_node/grpo_flux_self.sh

SDXL (8 GPUs)

bash scripts/single_node/grpo_self_sdxl.sh
# or specify GPU count:
bash scripts/single_node/grpo_self_sdxl.sh 4 sdxl_self_4gpu

WAN 2.1 Video (8 GPUs)

bash scripts/single_node/grpo_wan_self.sh

Reward Models

When using hybrid training (train_sd3_self_ext.py) or external reward evaluation, configure the reward function in the config:

config.reward_fn = {
    "ocr": 1.0,        # OCR accuracy reward
    # "pickscore": 1.0, # PickScore reward
    # "geneval": 1.0,   # GenEval reward
}

External reward models are loaded automatically. The OCR reward uses EasyOCR, while PickScore and ImageReward require their respective packages.

Dataset

Training prompts are provided in dataset/. Each subdirectory contains train.txt (training prompts) and test.txt (evaluation prompts). Specify the dataset path in the config:

config.dataset = os.path.join(os.getcwd(), "dataset/ocr")

Configuration

All configs are in config/solace.py. Key parameters:

Parameter	Description
`config.train.beta`	KL divergence loss weight
`config.sample.num_image_per_prompt`	Group size for GRPO
`config.sample.global_std`	Use global std for advantage normalization
`config.train.ema`	Enable EMA for reference model
`config.train.sds.k`	Number of antithetic probes (SDXL)
`config.sample.noise_level`	Noise injection level (Flux)

Citation

@inproceedings{kim2026solace,
    title={Improving Text-to-Image Generation with Intrinsic Self-Confidence Rewards},
    author={Kim, Wookyoung and others},
    booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    year={2026}
}

Acknowledgments

This codebase is built upon Flow-GRPO by Jie Liu et al. We thank the authors for their beautiful open-source framework for applying GRPO to flow-matching diffusion models.

License

This project is licensed under the MIT License — see LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
config		config
dataset		dataset
scripts		scripts
solace		solace
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SOLACE: Improving Text-to-Image Generation with Intrinsic Self-Confidence Rewards

Abstract

Supported Models

Installation

Training

SD3.5-Medium (8 GPUs)

SD3.5-Medium Hybrid (self + external reward)

Flux.1-dev (8 GPUs)

SDXL (8 GPUs)

WAN 2.1 Video (8 GPUs)

Reward Models

Dataset

Configuration

Citation

Acknowledgments

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SOLACE: Improving Text-to-Image Generation with Intrinsic Self-Confidence Rewards

Abstract

Supported Models

Installation

Training

SD3.5-Medium (8 GPUs)

SD3.5-Medium Hybrid (self + external reward)

Flux.1-dev (8 GPUs)

SDXL (8 GPUs)

WAN 2.1 Video (8 GPUs)

Reward Models

Dataset

Configuration

Citation

Acknowledgments

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages