hanshen95

Shen, Han hanshen95

Research Engineer@Ant Group, Ph.D. from RPI.

Achievements

antgroup/AEnt antgroup/AEnt Public

An implementation of the regularization method "AEnt" introduced in "on entropy control in LLM-RL algorithms".

Python 2
inclusionAI/AReaL inclusionAI/AReaL Public

Lightning-Fast RL for LLM Reasoning and Agents. Made Simple & Flexible.

Python 4.8k 416
SEAL SEAL Public

An implementation of SEAL: Safety-Enhanced Aligned LLM fine-tuning via bilevel data selection.

Python 24 4
heshandevaka/XRIGHT heshandevaka/XRIGHT Public

The official PyTorch implementation of ALRIGHT and MAXRIGHT algorithms for efficient trade-off in LLM post-training

Python 13 1
penalized-bilevel-gradient-descent penalized-bilevel-gradient-descent Public

An implementation of the penalty-based bilevel gradient descent (PBGD) algorithm and the iterative differentiation (ITD/RHG) methods.

Python 19 4
AiPOD AiPOD Public

Pytorch implementation of AiPOD and E-AiPOD.

Python 5 2