Official implementation of the paper "LLM-Oriented Token-Adaptive Knowledge Distillation" (AAAI 2026).
AdaKD is a plug-and-play framework for logit-based distillation that dynamically adapts to the student's learning state. It features two synergistic modules:
- Loss-driven Adaptive Token Focusing (LATF): Concentrates distillation on valuable tokens by monitoring learning stability.
- Inverse Difficulty Temperature Scaling (IDTS): Applies token-level temperatures—low for hard tokens (error correction) and high for easy tokens (better generalization).
Note
The requirements.txt and specific environment setup scripts are currently being finalized.
The training data is based on the databricks-dolly-15k dataset. You can download our processed version here:
- Processed Data: Google Drive Link
Please place the downloaded data in the data/ directory.
Important
Bash scripts for automated training and evaluation are coming soon.
Our code is built upon the following open-source projects:
- distillm: Towards Streamlined Distillation for Large Language Models.
- minillm: Knowledge Distillation of Large Language Models.
We thank the authors for their great work!
If you find our work useful in your research, please consider citing:
@inproceedings{xie2026adakd,
title={LLM-Oriented Token-Adaptive Knowledge Distillation},
author={Xie, Xurong and Xue, Zhucun and Wu, Jiafu and Li, Jian and Wang, Yabiao and Hu, Xiaobin and Liu, Yong and Zhang, Jiangning},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence (AAAI)},
year={2026}
}