Enhancing Reward Models for High-quality Image Generation: Beyond Text-Image Alignment

ICTHP: ICT & HP Reward Models for High-quality Image Generation

Official Implementation | ICCV 2025

🚀 Overview

This repository provides the official implementation of ICT and HP reward models from our paper "Enhancing Reward Models for High-quality Image Generation: Beyond Text-Image Alignment".

Current reward models exhibit inherent bias by inappropriately assigning low scores to images with rich details and high aesthetic value, creating significant discrepancy with actual human aesthetic preferences. Our dual-component framework addresses these critical limitations through:

⭐ ICT (Image-Contained-Text) Reward Model: Novel training objective that quantifies how well images contain textual information without penalizing visual richness

⭐ HP (High-Preference) Reward Model: Pure image modality assessment for aesthetic quality and human preferences

🔗 Models & Dataset

We provide pre-trained reward models and the Pick-High dataset to facilitate research and enable reproducible results.

⭐ ICT Reward Model

The Image-Contained-Text (ICT) Reward Model employs a novel contrastive learning objective with hierarchical prompt structures to assess text-image alignment quality. By learning from both basic and refined prompt-image pairs, ICT mitigates the bias against visually rich content that plagues existing alignment metrics.

⭐ HP Reward Model

The High-Preference (HP) Reward Model leverages a fine-tuned CLIP backbone coupled with a specialized Multi-Layer Perceptron to predict human aesthetic preferences from pure visual modality. Trained with margin ranking loss on preference triplets, HP provides orthogonal quality assessment that captures aesthetic nuances beyond semantic alignment.

🎨 Pick-High Dataset

Pick-High is a high-quality dataset containing 360,000 images generated by SD3.5-Large using refined prompts from Claude-3.5-Sonnet chain-of-thought reasoning, combined with Pick-a-Pic to form image triplets with comprehensive preference annotations for training and evaluating reward models.

🛠️ Installation

Requirements

Python 3.8+
PyTorch 1.12+
CUDA-compatible GPU (8GB+ VRAM recommended)
16GB+ RAM for training

Quick Installation

git clone https://github.com/BarretBa/ICTHP.git
cd ICTHP
pip install -r requirements.txt

🚀 Quick Start

Model Inference and Evaluation

Download ICT and HP model weights:

# Download ICT model weights
git clone https://huggingface.co/8y/ICT ./ICTHP_models/ICT

# Download HP model weights  
git clone https://huggingface.co/8y/HP ./ICTHP_models/HP

Evaluate the models on sample images:

# Execute evaluation pipeline with demo image sets
./eval.sh

Training from Scratch

Follow these steps to train ICT and HP reward models from scratch:

Step 1: Download Pick-High Dataset

# Method 1: Using huggingface-cli (Recommended)
pip install huggingface_hub[cli]
huggingface-cli download 8y/Pick-High-Dataset --repo-type dataset

# Method 2: Using Git LFS
git lfs install
git clone https://huggingface.co/datasets/8y/Pick-High-Dataset

# Method 3: Using datasets library
pip install datasets
python -c "from datasets import load_dataset; load_dataset('8y/Pick-High-Dataset')"

Step 2: Configure Dataset Paths

Update trainer/conf/experiment/train_ict.yaml and trainer/conf/experiment/train_hp.yaml:

dataset:
  dataset_name: ./Pick-High-Dataset/Pick-High/
  easy_folder: ./Pick-High-Dataset/pick_easy_img
  refine_folder: ./Pick-High-Dataset/pick_refine_img

Step 3: Train ICT Model

Train the ICT (Image-Contained-Text) model:

# Multi-GPU training
./train_ict.sh

Step 4: Train HP Model

Train the HP (High-Preference) model:

# Multi-GPU training
./train_hp.sh

Load Dataset Locally (Optional)

To explore the dataset structure, use our provided data loading script:

# Run the data loading example
cd trainer/datasets
python pick-high.py

Dataset Structure:

Pick-High-Dataset/
├── Pick-High/
│   ├── train.pkl          # Training annotations
│   ├── val.pkl           # Validation annotations
│   └── test.pkl          # Test annotations
├── pick_easy_img/         # Basic quality images
│   └── train/val/test/
└── pick_refine_img/       # High-quality refined images
    └── train/val/test/

📝 Citation

If you find this work helpful for your research or use our ICT (Image-Contained-Text) reward model, HP (High-Preference) reward model, or Pick-High dataset, we would appreciate it if you could cite our paper:

@misc{ba2025enhancingrewardmodelshighquality,
      title={Enhancing Reward Models for High-quality Image Generation: Beyond Text-Image Alignment}, 
      author={Ying Ba and Tianyu Zhang and Yalong Bai and Wenyi Mo and Tao Liang and Bing Su and Ji-Rong Wen},
      year={2025},
      eprint={2507.19002},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2507.19002}, 
}

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🔗 Resources

🤗 ICT Model: 8y/ICT - Text-Image Alignment Model
🤗 HP Model: 8y/HP - Aesthetic Quality Model
🤗 Pick-High Dataset: 8y/Pick-High-Dataset - High-Quality Dataset
📄 Paper: Enhancing Reward Models for High-quality Image Generation: Beyond Text-Image Alignment
🔗 Base Project: PickScore - Pick-a-Pic Dataset and PickScore Model

⭐ Acknowledgments

PickAPic Dataset: Foundation for our Pick-High dataset
OpenAI CLIP: Base architecture for our reward models
Hugging Face: Transformers and Accelerate libraries
Community: Contributors and researchers in the field

Star ⭐ this repository to stay updated with the latest developments!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Enhancing Reward Models for High-quality Image Generation: Beyond Text-Image Alignment

ICTHP: ICT & HP Reward Models for High-quality Image Generation

🚀 Overview

🔗 Models & Dataset

⭐ ICT Reward Model

⭐ HP Reward Model

🎨 Pick-High Dataset

🛠️ Installation

Requirements

Quick Installation

🚀 Quick Start

Model Inference and Evaluation

Training from Scratch

Step 1: Download Pick-High Dataset

Step 2: Configure Dataset Paths

Step 3: Train ICT Model

Step 4: Train HP Model

Load Dataset Locally (Optional)

📝 Citation

📄 License

🔗 Resources

⭐ Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
trainer		trainer
LICENSE		LICENSE
README.md		README.md
eval.sh		eval.sh
requirements.txt		requirements.txt
setup.py		setup.py
train_hp.sh		train_hp.sh
train_ict.sh		train_ict.sh

Folders and files

Latest commit

History

Repository files navigation

Enhancing Reward Models for High-quality Image Generation: Beyond Text-Image Alignment

ICTHP: ICT & HP Reward Models for High-quality Image Generation

🚀 Overview

🔗 Models & Dataset

⭐ ICT Reward Model

⭐ HP Reward Model

🎨 Pick-High Dataset

🛠️ Installation

Requirements

Quick Installation

🚀 Quick Start

Model Inference and Evaluation

Training from Scratch

Step 1: Download Pick-High Dataset

Step 2: Configure Dataset Paths

Step 3: Train ICT Model

Step 4: Train HP Model

Load Dataset Locally (Optional)

📝 Citation

📄 License

🔗 Resources

⭐ Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages