Minimal scripts demonstrating each module in the alignrl package.
| Script | Description |
|---|---|
quickstart_sft.py |
Fine-tune Qwen2.5-3B on instruction data with QLoRA |
quickstart_grpo.py |
Train a math reasoning model with GRPO and verifiable rewards |
quickstart_dpo.py |
Align a model with human preferences using DPO |
evaluate_stages.py |
Evaluate and compare model performance across training stages |
serve_model.py |
Serve a trained model for interactive inference |
custom_rewards.py |
Define custom reward functions for GRPO training |
launch_demo.py |
Launch the Gradio comparison demo |
Install the package first:
pip install -e ".[dev]"All examples use report_to="none" and small dataset subsets so they run quickly without a W&B account or large downloads.
- quickstart_sft.py - supervised fine-tuning baseline
- quickstart_grpo.py - reinforcement learning with math rewards
- quickstart_dpo.py - preference alignment
- evaluate_stages.py - compare all stages on benchmarks
- serve_model.py - interact with a trained model
- custom_rewards.py - extend GRPO with your own reward functions
- launch_demo.py - visual side-by-side comparison