Skip to content

PKULab1806/Fairy2i-W2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

12 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Fairy2i-W2

πŸ”— Links

Hugging FacePaper ModelScope

Table of Contents

πŸ“ Abstract

Large language models (LLMs) have revolutionized artificial intelligence, yet their massive memory and computational demands necessitate aggressive quantization, increasingly pushing representations toward the theoretical limit of a single bit. While complex-valued LLMs, such as iFairy, offer a superior chance for low-bit representation compared to real-valued counterparts, they require training from scratch, preventing the utilization of the vast ecosystem of pre-trained real-valued foundation models.

Here we present Fairy2i, a universal framework that transforms pre-trained real-valued layers into an equivalent widely-linear complex form, enabling extremely low-bit quantization while reusing existing checkpoints. By proving a lossless mathematical equivalence between real and widely-linear maps, we convert standard Transformers into the complex domain and employ a phase-aware quantization scheme with a highly efficient codebook of fourth roots of unity ${\pm 1, \pm i}$. Furthermore, we introduce a recursive residual quantization mechanism that iteratively minimizes quantization error, allowing inference to proceed via efficient multiplication-free accumulation.

We demonstrate that Fairy2i-W2 restores the performance of LLaMA-2 7B at an effective 2-bit precision to levels nearly comparable with full-precision baselines, significantly outperforming state-of-the-art real-valued binary and ternary quantization methods.

This work bridges the gap between the representational efficiency of complex-valued arithmetic and the practical utility of pre-trained models, paving a new way for efficient inference on commodity hardware.

πŸ”¬ Method

Fairy2i-W2 consists of three key components:

πŸ”„ Widely-Linear Transformation

We transform pre-trained real-valued linear layers into an equivalent widely-linear complex form without altering the model's behavior. Each real linear layer $R$ (a real matrix of size $2n \times 2m$) is reparameterized into two complex matrices $U$ and $W$ (each of size $n \times m$) such that:

$$y = Ux + W\bar{x}$$

where $\bar{x}$ denotes the complex conjugate of $x$. This transformation is lossless and unique, preserving the original forward computation before quantization.

⚑ Phase-Aware Complex Quantization

We quantize complex weights using a phase-based scheme with the codebook ${\pm 1, \pm i}$ (fourth roots of unity). For each complex weight, we project it to the nearest codeword by angle and apply axis-wise scaling factors. During QAT training, we maintain full-precision master weights and use quantized copies in the forward pass with straight-through estimator (STE) gradients.

πŸ” Recursive Residual Quantization

To further reduce quantization error, we recursively quantize the residual error. Each complex weight is represented as a sum of low-bit terms:

$$W_q \approx \sum_{t=0}^{T-1} W^{(t)}$$

where each term is quantized using the same phase-aware mechanism. For Fairy2i-W2 ($T=2$), we use 2 recursive stages, achieving an effective 2 bits per real parameter.

Evaluation

πŸ“ˆ Main Results on LLaMA-2 7B

Method Bits C4 PPL↓ ARC-e ARC-c HellaSwag PIQA Winogrande Avg.
LLaMA-2 (FP16) 16 6.63 75.59 43.17 57.06 77.91 69.85 64.72
Fairy2i-W2 2 7.85 72.73 39.76 53.33 76.17 68.03 62.00
AQLM 2 8.54 63.68 32.76 49.55 74.76 65.67 57.28
QuIP# 2 11.01 55.56 28.84 42.94 71.38 62.43 52.23
Real-Ternary (QAT) 1.58 11.06 55.93 24.15 38.43 69.80 55.17 48.70
Fairy2i-W1 1 11.03 56.56 24.82 38.19 70.08 53.67 48.66
Real-Binary (QAT) 1 11.75 53.32 22.70 35.57 66.81 52.64 46.21
GPTQ 3 10.61 58.46 31.06 45.21 71.49 59.19 53.08

Key Results:

  • Fairy2i-W2 (2-bit) achieves a perplexity of 7.85, closing the gap to FP16 (6.63) while outperforming all 2-bit PTQ methods
  • Fairy2i-W2 achieves 62.00% average accuracy on zero-shot tasks, highly competitive with FP16 (64.72%)
  • Fairy2i-W1 (1-bit) outperforms real-valued binary and ternary baselines at the same or lower bit budgets

πŸš€ Quick Start

Fairy2i-W2 is based on LLaMA-2 7B architecture, with only the linear layers replaced by complex-valued QAT layers. The model structure is otherwise identical to LLaMA-2.

πŸ“¦ Installation

pip install torch transformers safetensors huggingface_hub accelerate datasets lm-eval

πŸ”„ Loading the Model

The model can be loaded using the model_module package. Here's a basic example:

from transformers import AutoModelForCausalLM, AutoTokenizer
from model_module.qat_modules import replace_modules_for_qat, convert_to_inference_mode
import torch

# Load base model
model_path = "meta-llama/Llama-2-7b-hf"  # or your local path
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    attn_implementation="flash_attention_2",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

# Replace linear layers with QAT modules
replace_modules_for_qat(model, "complex_phase_v2", skip_lm_head=False)

# Convert to inference mode for faster inference
convert_to_inference_mode(model)

# The model is ready to use!
prompt = "Hello, how are you?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=50,
        do_sample=True,
        temperature=0.7
    )

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

πŸ“Š Data Processing

The training data is processed from RedPajama-Data-1T using two sequential steps:

Step 1: Sample 100B tokens from RedPajama-Data-1T

Use dataset/sample.py to sample 100B tokens from the RedPajama-Data-1T dataset:

cd dataset
python sample.py

This script:

  • Loads the RedPajama-Data-1T dataset from Hugging Face
  • Samples approximately 100B tokens using 10 parallel processes
  • Saves the sampled data to new_dataset_100B_redpajama_final_dataset{0-9} directories

Step 2: Process into 2048-token aligned blocks

Use dataset/padding_and_cut.py to chunk the sampled data into 2048-token aligned blocks:

cd dataset
python padding_and_cut.py

This script:

  • Loads the sampled datasets from Step 1
  • Processes data into 2048-token aligned blocks using group_and_chunk function
  • Saves the processed data to dataset_100B_redpajama_2048_aligned/ directory

Note: Make sure to update the input paths in padding_and_cut.py to point to your sampled dataset directories.

Custom DataCollator

The training uses a custom MyDataCollatorForLanguageModeling class defined in train/mydatacollator.py. This collator is specifically designed to work with the 2048-token aligned data blocks.

To use the custom DataCollator:

You can directly copy train/mydatacollator.py into transformers.data.data_collator module (version-independent). The custom collator handles:

  • Proper label masking for aligned 2048-token blocks
  • EOS token position handling for causal language modeling
  • Compatibility with the pre-processed aligned dataset format

The custom collator is automatically imported in the training script via:

from transformers.data.data_collator import MyDataCollatorForLanguageModeling

πŸ‹οΈ Training

To train a model with QAT, use the training script:

cd train
bash train.sh

Note: For Fairy2i-W2, the training uses fixed parameters:

  • --quant_method complex_phase_v2 (1-step recursive residual quantization)
  • --skip_lm_head False (lm_head will be replaced)

The training script supports the following arguments:

  • --quant_method: QAT quantization method (choices: bitnet, complex_phase_v1, complex_phase_v2, complex_phase_v3, complex_phase_v4)
  • --skip_lm_head: Whether to skip replacement of lm_head layer (default: False)

βœ… Evaluation

πŸ“‰ Perplexity Evaluation

Evaluate perplexity on Wikitext-2 and C4 datasets:

cd eval
bash eval_ppl.sh

🎯 Task Evaluation

Evaluate on downstream tasks using lm-eval:

cd eval
bash eval_task.sh

ℹ️ Model Details

  • Base Model: LLaMA-2 7B
  • Quantization Method: Complex-Phase V2 (2-step recursive residual quantization)
  • Effective Bit Width: 2 bits per real parameter
  • Codebook: ${\pm 1, \pm i}$ (fourth roots of unity)
  • Training: QAT (Quantization-Aware Training) on 30B tokens(30% of 100B) from RedPajama dataset

πŸ“ Repository Structure

fairy2i-w2-repo-github/
β”œβ”€β”€ README.md
β”œβ”€β”€ model_module/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ qat_modules.py          # QAT linear layer implementations
β”‚   └── quantization.py         # Quantization functions (PhaseQuant, BitNet, etc.)
β”œβ”€β”€ dataset/
β”‚   β”œβ”€β”€ sample.py               # Sample 100B tokens from RedPajama-Data-1T
β”‚   └── padding_and_cut.py     # Process data into 2048-token aligned blocks
β”œβ”€β”€ train/
β”‚   β”œβ”€β”€ train.py                # Training script
β”‚   β”œβ”€β”€ train.sh                # Training launch script
β”‚   β”œβ”€β”€ mydatacollator.py       # Custom DataCollator for aligned data
β”‚   └── complexnet_config.yaml  # Accelerate configuration
└── eval/
    β”œβ”€β”€ eval_ppl.py             # Perplexity evaluation script
    β”œβ”€β”€ eval_ppl.sh             # Perplexity evaluation launcher
    β”œβ”€β”€ eval_task.py            # Task evaluation script
    β”œβ”€β”€ eval_task.sh            # Task evaluation launcher
    └── eval_utils.py            # Evaluation utilities

πŸ“š Citation

If you use Fairy2i-W2 in your research, please cite:

@article{wang2025fairy2i,
  title={Fairy2i: Training Complex LLMs from Real LLMs with All Parameters in $\{\pm 1, \pm i\}$},
  author={Wang, Feiyu and Tan, Xinyu and Huang, Bokai and Zhang, Yihao and Wang, Guoan and Cong, Peizhuang and Yang, Tong},
  journal={arXiv preprint},
  year={2025}
}

βš–οΈ License

This model follows the same license as LLaMA-2. Please refer to the original LLaMA-2 license for details.

πŸ“§ Contact

For questions or issues, please contact: [email protected]

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors