Skip to content
View hanshen95's full-sized avatar

Block or report hanshen95

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Pinned Loading

  1. antgroup/AEnt antgroup/AEnt Public

    An implementation of the regularization method "AEnt" introduced in "on entropy control in LLM-RL algorithms".

    Python 2

  2. inclusionAI/AReaL inclusionAI/AReaL Public

    Lightning-Fast RL for LLM Reasoning and Agents. Made Simple & Flexible.

    Python 4.8k 416

  3. SEAL SEAL Public

    An implementation of SEAL: Safety-Enhanced Aligned LLM fine-tuning via bilevel data selection.

    Python 24 4

  4. heshandevaka/XRIGHT heshandevaka/XRIGHT Public

    The official PyTorch implementation of ALRIGHT and MAXRIGHT algorithms for efficient trade-off in LLM post-training

    Python 13 1

  5. penalized-bilevel-gradient-descent penalized-bilevel-gradient-descent Public

    An implementation of the penalty-based bilevel gradient descent (PBGD) algorithm and the iterative differentiation (ITD/RHG) methods.

    Python 19 4

  6. AiPOD AiPOD Public

    Pytorch implementation of AiPOD and E-AiPOD.

    Python 5 2