Skip to content
View Aaronhuang-778's full-sized avatar

Organizations

@Efficient-Large-Model

Block or report Aaronhuang-778

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Pinned Loading

  1. NVlabs/Long-RL NVlabs/Long-RL Public

    Long-RL: Scaling RL to Long Sequences (NeurIPS 2025)

    Python 711 27

  2. NVlabs/QeRL NVlabs/QeRL Public

    [ICLR 2026]QeRL enables RL for 32B LLMs on a single H100 GPU.

    Python 500 51

  3. BiLLM BiLLM Public

    [ICML 2024] BiLLM: Pushing the Limit of Post-Training Quantization for LLMs

    Python 229 19

  4. Mixture-Compressor-MoE Mixture-Compressor-MoE Public

    [ICLR 2025, IEEE TPAMI 2026] Mixture Compressor & MC#

    Python 72 6

  5. SliM-LLM SliM-LLM Public

    [ICML 2025] SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models

    Python 59 5

  6. WeianMao/triattention WeianMao/triattention Public

    TriAttention — Efficient long reasoning with trigonometric KV cache compression. Enables OpenClaw local deployment on memory-constrained GPUs.

    Python 624 51