Reference implementation (example) of the model proposed in the paper:
Pivoting Factorization: A Compact Meta Low-Rank Representation of Sparsity for Efficient Inference in Large Language Models (Published at ICML 2025)
Jialin Zhao1,2, Yingtao Zhang1,2, Carlo Vittorio Cannistraci1,2,3
1 Center for Complex Network Intelligence (CCNI), Tsinghua Laboratory of Brain and Intelligence (THBI), Department of Psychological and Cognitive Sciences, 2 Department of Computer Science, 3 Department of Biomedical Engineering, Tsinghua University, China
Correspondence to: Jialin Zhao [email protected], Carlo Vittorio Cannistraci [email protected]
Ensure you have the necessary dependencies installed:
pip install -r requirements.txtpython PIFA.py --model ../model/llama-2-7b-hf --mode m --overall_ratio 0.5 \
--attn_ratio 0.5 --pruning_nsamples 256 --dataset wikitext2 --seed 3 \
--model_seq_len 2048 --save_path ./results --old_output_factor 0.25 \
--reconstruct_step 2 --reconstruct_nsamples 128 --use_pifa
python PIFA.py --mode pifa --model_path <path_to_model>
python PIFA.py --mode ppl --model_path <path_to_model>
Please contact [email protected] in case you have any questions.
This repository is built upon the SVD-LLM repository.
Please cite our paper if you use the model or this code in your own work:
@inproceedings{
zhao2025pivoting,
title={Pivoting Factorization: A Compact Meta Low-Rank Representation of Sparsity for Efficient Inference in Large Language Models},
author={Zhao, Jialin and Zhang, Yingtao and Cannistraci, Carlo Vittorio},
booktitle={Forty-second International Conference on Machine Learning},
year={2025},
url={https://openreview.net/forum?id=5OLRHkzTYk}
}