examples

alignrl Examples

Minimal scripts demonstrating each module in the alignrl package.

Script	Description
`quickstart_sft.py`	Fine-tune Qwen2.5-3B on instruction data with QLoRA
`quickstart_grpo.py`	Train a math reasoning model with GRPO and verifiable rewards
`quickstart_dpo.py`	Align a model with human preferences using DPO
`evaluate_stages.py`	Evaluate and compare model performance across training stages
`serve_model.py`	Serve a trained model for interactive inference
`custom_rewards.py`	Define custom reward functions for GRPO training
`launch_demo.py`	Launch the Gradio comparison demo

Install the package first:

pip install -e ".[dev]"

All examples use report_to="none" and small dataset subsets so they run quickly without a W&B account or large downloads.