Skip to content

hugebenevolence/LLaMA-OSS

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

78 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

AIVN Β Β Β  VLAI

Curating Multi-Mode CoT for Efficient Math Reasoning with GPT-OSS

License Paper Project Page

VLAI

This repository contains the official implementation for LLaMA-OSS, the official implementation of our knowledge distillation framework that curates multi-mode chain-of-thought (CoT) reasoning from GPT-OSS for efficient mathematical question answering. Our approach addresses the challenge of noisy and overly verbose supervision in dataset-based distillation by implementing a two-step curation pipeline that emphasizes quality over quantity.

This framework is built with a focus on modularity, performance, and ease of use, making it suitable for both research and practical applications.

Table of Contents

Features

  • Multi-Mode CoT Generation: Leverages GPT-OSS's low/medium/high inference modes for controllable reasoning generation
  • Two-Step Curation Pipeline:
    • Final-answer verification to filter incorrect reasoning traces
    • Length distribution-based filtering with median-length selection to eliminate verbosity
  • SFT + GRPO Training: Complete pipeline from supervised fine-tuning to policy optimization
  • Comprehensive Evaluation: Automated evaluation on GSM8K and MATH500 benchmarks
  • LLaMA-Factory Integration: Built on LLaMA-Factory for efficient training workflows
  • MS-SWIFT Support: Compatible with ModelScope-SWIFT framework
  • Modular Design: Easy to extend for other reasoning tasks or teacher models

Experimental Results

All experiments use Llama 3.2 3B as the student model, distilled from GPT-OSS teacher models.

GSM8K Results

Model Training Dataset GSM8K 0-shot GSM8K 5-shot
Llama3.2 - π’Ÿorig 0.7043 0.7043
Llama3.2 - π’Ÿ* 0.7043 0.7104
Llama3.2 SFT π’Ÿ* 0.6876 0.5762
Llama3.2 SFT π’Ÿ*low 0.7111 0.7142
Llama3.2 SFT π’Ÿ*med 0.7051 0.7074
Llama3.2 SFT π’Ÿ*high 0.7013 0.7051
Llama3.2-π’Ÿorig GRPO π’Ÿ 0.7771 0.6603
Llama3.2-π’Ÿ* GRPO π’Ÿ 0.7847 0.6156
Llama3.2-π’Ÿ*low GRPO π’Ÿ 0.6308 0.5861
Llama3.2-π’Ÿ*low GRPO π’Ÿ 0.8006 0.7195
Llama3.2-π’Ÿ*med GRPO π’Ÿ 0.7771 0.6323
Llama3.2-π’Ÿ*high GRPO π’Ÿ 0.7559 0.7225

MATH500 Results

Model Training Dataset MATH500 0-shot MATH500 4-shot
Llama3.2 - π’Ÿorig 0.3960 0.4340
Llama3.2 - π’Ÿ* 0.4060 0.4240
Llama3.2 SFT π’Ÿ* 0.3400 0.2420
Llama3.2 SFT π’Ÿ*low 0.4100 0.4400
Llama3.2 SFT π’Ÿ*med 0.4000 0.4160
Llama3.2 SFT π’Ÿ*high 0.4140 0.3920
Llama3.2-π’Ÿorig GRPO π’Ÿ 0.4540 0.4380
Llama3.2-π’Ÿ* GRPO π’Ÿ 0.4520 0.4560
Llama3.2-π’Ÿ*low GRPO π’Ÿ 0.4400 0.4220
Llama3.2-π’Ÿ*low GRPO π’Ÿ 0.4760 0.4520
Llama3.2-π’Ÿ*med GRPO π’Ÿ 0.4480 0.4600
Llama3.2-π’Ÿ*high GRPO π’Ÿ 0.4740 0.4600

Getting Started

To get started with the framework, please follow the detailed setup and usage guides.

Setup and Installation

Our comprehensive setup guide provides detailed instructions for environment preparation, dependency installation, and model/dataset acquisition. It covers system requirements, virtual environment setup, and verification steps to ensure a smooth start.

➑️ View Full Setup Guide

Usage

The usage guide explains how to run inference, perform batch processing, and evaluate models on benchmark datasets. It includes command-line examples for various scenarios.

➑️ View Full Usage Guide

Framework Guides

For advanced users and researchers, we provide in-depth guides on configuring the framework and running evaluation protocols.

Configuration

The configuration system is designed for flexibility. You can easily modify data paths, model parameters, and processing settings. This guide details the structure of the configuration files and how to customize them.

➑️ View Full Configuration Guide

Evaluation

This guide provides instructions on how to run the evaluation scripts, interpret the results, and perform comparative analysis between different models and configurations.

➑️ View Full Evaluation Guide

Programmatic Usage

The modular design of the framework allows for easy integration into your own Python projects. You can import and use the components directly for custom workflows.

from src.curation import CurationPipeline, AnswerVerifier, LengthFilter
from src.generator import GPTOSSGenerator

# 1. Setup the GPT-OSS generator for multi-mode CoT
generator = GPTOSSGenerator(
    model_name='gpt-4o',
    modes=['low', 'medium', 'high']
)

# 2. Generate CoT traces from your math dataset
math_problems = [
    {"question": "What is 15 + 27?", "answer": "42"},
    {"question": "Janet's ducks lay 16 eggs per day...", "answer": "18"}
]

cot_data = generator.generate_multi_mode_cot(
    problems=math_problems,
    output_path='raw_cot_data.jsonl'
)

# 3. Apply the two-step curation pipeline
curation = CurationPipeline(
    answer_verifier=AnswerVerifier(),
    length_filter=LengthFilter(percentile_range=(25, 75))
)

# Step 1: Answer verification
verified_data = curation.verify_answers(cot_data)

# Step 2: Length-based filtering with median selection
curated_data = curation.filter_by_length(verified_data, select_median=True)

# 4. Save mode-specific curated datasets
curation.save_by_mode(
    curated_data,
    output_dir='outputs',
    filenames={
        'low': 'cot_low.jsonl',
        'medium': 'cot_med.jsonl',
        'high': 'cot_high.jsonl'
    }
)

print(f"Curated {len(curated_data)} high-quality reasoning traces")
print(f"Files saved: cot_low.jsonl, cot_med.jsonl, cot_high.jsonl")

Citation

If you use this framework or find our work helpful, please consider citing:

@misc{llama-oss-2025,
  author    = {Hai-Au Trinh, Tue-Anh Vu, Dai-Nhan Tran, Uyen Khoi-Minh Huynh, Anh-Khoi Nguyen},
  title     = {Curating Multi-Mode CoT for Efficient Math Reasoning with GPT-OSS},
  year      = {2025},
  publisher = {},
  journal   = {},
  howpublished = {\url{https://github.com/Koii2k3/LLaMA-OSS}},
}

Acknowledgements

This project is built upon the excellent work of several open-source projects and research contributions. We would like to extend our gratitude to:

  • The teams behind LLaMA-Factory and MS-SWIFT for their high-performance inference libraries.
  • Meta LLaMA - Foundation model
  • The Hugging Face team for the transformers and accelerate libraries.

Special thanks to the research community for advancing efficient LLM training techniques.


Note: This is an active research project. Contributions, issues, and feature requests are welcome! Please check our contributing guidelines before submitting PRs.

About

[ICISN-26] Curating Multi-Mode CoT for Efficient Math Reasoning with GPT-OSS

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 91.7%
  • Jupyter Notebook 4.5%
  • Shell 3.7%
  • C++ 0.1%
  • Dockerfile 0.0%
  • Makefile 0.0%