- π Our OPRM paper (Learning Ordinal Probabilistic Reward from Preferences) has been accepted by ICLR'26!
- π₯ OPRM is coming! We have released the paper, code, models!
π Coming soon!
If you find this work helpful, please cite us.
@article{chen2026learning,
title={Learning Ordinal Probabilistic Reward from Preferences},
author={Chen, Longze and Wang, Lu and Shan, Renke and Gong, Ze and Luo, Run and Li, Jiaming and Luo, Jing and Wang, Qiyao and Yang, Min},
journal={arXiv preprint arXiv:2602.12660},
year={2026},
url={https://arxiv.org/abs/2602.12660}
}Our implementation is based on a recent version of LlamaFactory.