- π Abstract
- π¬ Method
- π Evaluation
- π Quick Start
- π Repository Structure
- π Citation
- βοΈ License
- π§ Contact
Large language models (LLMs) have revolutionized artificial intelligence, yet their massive memory and computational demands necessitate aggressive quantization, increasingly pushing representations toward the theoretical limit of a single bit. While complex-valued LLMs, such as iFairy, offer a superior chance for low-bit representation compared to real-valued counterparts, they require training from scratch, preventing the utilization of the vast ecosystem of pre-trained real-valued foundation models.
Here we present Fairy2i, a universal framework that transforms pre-trained real-valued layers into an equivalent widely-linear complex form, enabling extremely low-bit quantization while reusing existing checkpoints. By proving a lossless mathematical equivalence between real and widely-linear maps, we convert standard Transformers into the complex domain and employ a phase-aware quantization scheme with a highly efficient codebook of fourth roots of unity
We demonstrate that Fairy2i-W2 restores the performance of LLaMA-2 7B at an effective 2-bit precision to levels nearly comparable with full-precision baselines, significantly outperforming state-of-the-art real-valued binary and ternary quantization methods.
This work bridges the gap between the representational efficiency of complex-valued arithmetic and the practical utility of pre-trained models, paving a new way for efficient inference on commodity hardware.
Fairy2i-W2 consists of three key components:
We transform pre-trained real-valued linear layers into an equivalent widely-linear complex form without altering the model's behavior. Each real linear layer
where
We quantize complex weights using a phase-based scheme with the codebook
To further reduce quantization error, we recursively quantize the residual error. Each complex weight is represented as a sum of low-bit terms:
where each term is quantized using the same phase-aware mechanism. For Fairy2i-W2 (
| Method | Bits | C4 PPLβ | ARC-e | ARC-c | HellaSwag | PIQA | Winogrande | Avg. |
|---|---|---|---|---|---|---|---|---|
| LLaMA-2 (FP16) | 16 | 6.63 | 75.59 | 43.17 | 57.06 | 77.91 | 69.85 | 64.72 |
| Fairy2i-W2 | 2 | 7.85 | 72.73 | 39.76 | 53.33 | 76.17 | 68.03 | 62.00 |
| AQLM | 2 | 8.54 | 63.68 | 32.76 | 49.55 | 74.76 | 65.67 | 57.28 |
| QuIP# | 2 | 11.01 | 55.56 | 28.84 | 42.94 | 71.38 | 62.43 | 52.23 |
| Real-Ternary (QAT) | 1.58 | 11.06 | 55.93 | 24.15 | 38.43 | 69.80 | 55.17 | 48.70 |
| Fairy2i-W1 | 1 | 11.03 | 56.56 | 24.82 | 38.19 | 70.08 | 53.67 | 48.66 |
| Real-Binary (QAT) | 1 | 11.75 | 53.32 | 22.70 | 35.57 | 66.81 | 52.64 | 46.21 |
| GPTQ | 3 | 10.61 | 58.46 | 31.06 | 45.21 | 71.49 | 59.19 | 53.08 |
Key Results:
- Fairy2i-W2 (2-bit) achieves a perplexity of 7.85, closing the gap to FP16 (6.63) while outperforming all 2-bit PTQ methods
- Fairy2i-W2 achieves 62.00% average accuracy on zero-shot tasks, highly competitive with FP16 (64.72%)
- Fairy2i-W1 (1-bit) outperforms real-valued binary and ternary baselines at the same or lower bit budgets
Fairy2i-W2 is based on LLaMA-2 7B architecture, with only the linear layers replaced by complex-valued QAT layers. The model structure is otherwise identical to LLaMA-2.
pip install torch transformers safetensors huggingface_hub accelerate datasets lm-evalThe model can be loaded using the model_module package. Here's a basic example:
from transformers import AutoModelForCausalLM, AutoTokenizer
from model_module.qat_modules import replace_modules_for_qat, convert_to_inference_mode
import torch
# Load base model
model_path = "meta-llama/Llama-2-7b-hf" # or your local path
model = AutoModelForCausalLM.from_pretrained(
model_path,
attn_implementation="flash_attention_2",
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
# Replace linear layers with QAT modules
replace_modules_for_qat(model, "complex_phase_v2", skip_lm_head=False)
# Convert to inference mode for faster inference
convert_to_inference_mode(model)
# The model is ready to use!
prompt = "Hello, how are you?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=50,
do_sample=True,
temperature=0.7
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)The training data is processed from RedPajama-Data-1T using two sequential steps:
Use dataset/sample.py to sample 100B tokens from the RedPajama-Data-1T dataset:
cd dataset
python sample.pyThis script:
- Loads the RedPajama-Data-1T dataset from Hugging Face
- Samples approximately 100B tokens using 10 parallel processes
- Saves the sampled data to
new_dataset_100B_redpajama_final_dataset{0-9}directories
Use dataset/padding_and_cut.py to chunk the sampled data into 2048-token aligned blocks:
cd dataset
python padding_and_cut.pyThis script:
- Loads the sampled datasets from Step 1
- Processes data into 2048-token aligned blocks using
group_and_chunkfunction - Saves the processed data to
dataset_100B_redpajama_2048_aligned/directory
Note: Make sure to update the input paths in padding_and_cut.py to point to your sampled dataset directories.
The training uses a custom MyDataCollatorForLanguageModeling class defined in train/mydatacollator.py. This collator is specifically designed to work with the 2048-token aligned data blocks.
To use the custom DataCollator:
You can directly copy train/mydatacollator.py into transformers.data.data_collator module (version-independent). The custom collator handles:
- Proper label masking for aligned 2048-token blocks
- EOS token position handling for causal language modeling
- Compatibility with the pre-processed aligned dataset format
The custom collator is automatically imported in the training script via:
from transformers.data.data_collator import MyDataCollatorForLanguageModelingTo train a model with QAT, use the training script:
cd train
bash train.shNote: For Fairy2i-W2, the training uses fixed parameters:
--quant_method complex_phase_v2(1-step recursive residual quantization)--skip_lm_head False(lm_head will be replaced)
The training script supports the following arguments:
--quant_method: QAT quantization method (choices:bitnet,complex_phase_v1,complex_phase_v2,complex_phase_v3,complex_phase_v4)--skip_lm_head: Whether to skip replacement of lm_head layer (default: False)
Evaluate perplexity on Wikitext-2 and C4 datasets:
cd eval
bash eval_ppl.shEvaluate on downstream tasks using lm-eval:
cd eval
bash eval_task.sh- Base Model: LLaMA-2 7B
- Quantization Method: Complex-Phase V2 (2-step recursive residual quantization)
- Effective Bit Width: 2 bits per real parameter
-
Codebook:
${\pm 1, \pm i}$ (fourth roots of unity) - Training: QAT (Quantization-Aware Training) on 30B tokens(30% of 100B) from RedPajama dataset
fairy2i-w2-repo-github/
βββ README.md
βββ model_module/
β βββ __init__.py
β βββ qat_modules.py # QAT linear layer implementations
β βββ quantization.py # Quantization functions (PhaseQuant, BitNet, etc.)
βββ dataset/
β βββ sample.py # Sample 100B tokens from RedPajama-Data-1T
β βββ padding_and_cut.py # Process data into 2048-token aligned blocks
βββ train/
β βββ train.py # Training script
β βββ train.sh # Training launch script
β βββ mydatacollator.py # Custom DataCollator for aligned data
β βββ complexnet_config.yaml # Accelerate configuration
βββ eval/
βββ eval_ppl.py # Perplexity evaluation script
βββ eval_ppl.sh # Perplexity evaluation launcher
βββ eval_task.py # Task evaluation script
βββ eval_task.sh # Task evaluation launcher
βββ eval_utils.py # Evaluation utilities
If you use Fairy2i-W2 in your research, please cite:
@article{wang2025fairy2i,
title={Fairy2i: Training Complex LLMs from Real LLMs with All Parameters in $\{\pm 1, \pm i\}$},
author={Wang, Feiyu and Tan, Xinyu and Huang, Bokai and Zhang, Yihao and Wang, Guoan and Cong, Peizhuang and Yang, Tong},
journal={arXiv preprint},
year={2025}
}This model follows the same license as LLaMA-2. Please refer to the original LLaMA-2 license for details.
For questions or issues, please contact: [email protected]