Pinned Loading
-
antgroup/AEnt
antgroup/AEnt PublicAn implementation of the regularization method "AEnt" introduced in "on entropy control in LLM-RL algorithms".
Python 2
-
inclusionAI/AReaL
inclusionAI/AReaL PublicLightning-Fast RL for LLM Reasoning and Agents. Made Simple & Flexible.
-
heshandevaka/XRIGHT
heshandevaka/XRIGHT PublicThe official PyTorch implementation of ALRIGHT and MAXRIGHT algorithms for efficient trade-off in LLM post-training
-
penalized-bilevel-gradient-descent
penalized-bilevel-gradient-descent PublicAn implementation of the penalty-based bilevel gradient descent (PBGD) algorithm and the iterative differentiation (ITD/RHG) methods.
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.
