Skip to content

BarretBa/ICTHP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

5 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Enhancing Reward Models for High-quality Image Generation: Beyond Text-Image Alignment

ICTHP: ICT & HP Reward Models for High-quality Image Generation

Python 3.8+ PyTorch License: MIT Paper

Official Implementation | ICCV 2025

πŸš€ Overview

This repository provides the official implementation of ICT and HP reward models from our paper "Enhancing Reward Models for High-quality Image Generation: Beyond Text-Image Alignment".

Current reward models exhibit inherent bias by inappropriately assigning low scores to images with rich details and high aesthetic value, creating significant discrepancy with actual human aesthetic preferences. Our dual-component framework addresses these critical limitations through:

⭐ ICT (Image-Contained-Text) Reward Model: Novel training objective that quantifies how well images contain textual information without penalizing visual richness

⭐ HP (High-Preference) Reward Model: Pure image modality assessment for aesthetic quality and human preferences

πŸ”— Models & Dataset

We provide pre-trained reward models and the Pick-High dataset to facilitate research and enable reproducible results.

⭐ ICT Reward Model

The Image-Contained-Text (ICT) Reward Model employs a novel contrastive learning objective with hierarchical prompt structures to assess text-image alignment quality. By learning from both basic and refined prompt-image pairs, ICT mitigates the bias against visually rich content that plagues existing alignment metrics.

⭐ ICT Model

⭐ HP Reward Model

The High-Preference (HP) Reward Model leverages a fine-tuned CLIP backbone coupled with a specialized Multi-Layer Perceptron to predict human aesthetic preferences from pure visual modality. Trained with margin ranking loss on preference triplets, HP provides orthogonal quality assessment that captures aesthetic nuances beyond semantic alignment.

⭐ HP Model

🎨 Pick-High Dataset

Pick-High is a high-quality dataset containing 360,000 images generated by SD3.5-Large using refined prompts from Claude-3.5-Sonnet chain-of-thought reasoning, combined with Pick-a-Pic to form image triplets with comprehensive preference annotations for training and evaluating reward models.

🎨 Pick-High Dataset

πŸ› οΈ Installation

Requirements

  • Python 3.8+
  • PyTorch 1.12+
  • CUDA-compatible GPU (8GB+ VRAM recommended)
  • 16GB+ RAM for training

Quick Installation

git clone https://github.com/BarretBa/ICTHP.git
cd ICTHP
pip install -r requirements.txt

πŸš€ Quick Start

Model Inference and Evaluation

Download ICT and HP model weights:

# Download ICT model weights
git clone https://huggingface.co/8y/ICT ./ICTHP_models/ICT

# Download HP model weights  
git clone https://huggingface.co/8y/HP ./ICTHP_models/HP

Evaluate the models on sample images:

# Execute evaluation pipeline with demo image sets
./eval.sh

Training from Scratch

Follow these steps to train ICT and HP reward models from scratch:

Step 1: Download Pick-High Dataset

# Method 1: Using huggingface-cli (Recommended)
pip install huggingface_hub[cli]
huggingface-cli download 8y/Pick-High-Dataset --repo-type dataset

# Method 2: Using Git LFS
git lfs install
git clone https://huggingface.co/datasets/8y/Pick-High-Dataset

# Method 3: Using datasets library
pip install datasets
python -c "from datasets import load_dataset; load_dataset('8y/Pick-High-Dataset')"

Step 2: Configure Dataset Paths

Update trainer/conf/experiment/train_ict.yaml and trainer/conf/experiment/train_hp.yaml:

dataset:
  dataset_name: ./Pick-High-Dataset/Pick-High/
  easy_folder: ./Pick-High-Dataset/pick_easy_img
  refine_folder: ./Pick-High-Dataset/pick_refine_img

Step 3: Train ICT Model

Train the ICT (Image-Contained-Text) model:

# Multi-GPU training
./train_ict.sh

Step 4: Train HP Model

Train the HP (High-Preference) model:

# Multi-GPU training
./train_hp.sh

Load Dataset Locally (Optional)

To explore the dataset structure, use our provided data loading script:

# Run the data loading example
cd trainer/datasets
python pick-high.py

Dataset Structure:

Pick-High-Dataset/
β”œβ”€β”€ Pick-High/
β”‚   β”œβ”€β”€ train.pkl          # Training annotations
β”‚   β”œβ”€β”€ val.pkl           # Validation annotations
β”‚   └── test.pkl          # Test annotations
β”œβ”€β”€ pick_easy_img/         # Basic quality images
β”‚   └── train/val/test/
└── pick_refine_img/       # High-quality refined images
    └── train/val/test/

πŸ“ Citation

If you find this work helpful for your research or use our ICT (Image-Contained-Text) reward model, HP (High-Preference) reward model, or Pick-High dataset, we would appreciate it if you could cite our paper:

@misc{ba2025enhancingrewardmodelshighquality,
      title={Enhancing Reward Models for High-quality Image Generation: Beyond Text-Image Alignment}, 
      author={Ying Ba and Tianyu Zhang and Yalong Bai and Wenyi Mo and Tao Liang and Bing Su and Ji-Rong Wen},
      year={2025},
      eprint={2507.19002},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2507.19002}, 
}

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ”— Resources

⭐ Acknowledgments

  • PickAPic Dataset: Foundation for our Pick-High dataset
  • OpenAI CLIP: Base architecture for our reward models
  • Hugging Face: Transformers and Accelerate libraries
  • Community: Contributors and researchers in the field

Star ⭐ this repository to stay updated with the latest developments!

About

Enhancing Reward Models for High-quality Image Generation: Beyond Text-Image Alignment [ICCV 2025] - Official implementation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors