Skip to content

wutaiqiang/LLM_KD_AKL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AKL

This is the offcial github for the paper Rethinking Kullback-Leibler Divergence in Knowledge Distillation for Large Language Models from Taiqiang Wu, Chaofan Tao, Jiahao Wang, Runming Yang, Zhe Zhao, Ngai Wong.

TL,DR: We provide a deeper insight into forward KL and reverse KL in the KD for LLM and then propose a novel AKL based on the analysis.

Blog|中文版

Conclusion:

In the KD for LLMs, the mean-seeking and mode-seeking behaviors do not hold for forward KL (FKL) and reverse KL (RKL),respectively. Instead, they share the same optimization objective. Meanwhile, FKL focuses on the head part and RKL focuses on the tail part at the beginning epochs.

Converage at begining epochs: png

Total process in GIF: gif

Toy Examples

To reproduce the toy examples, you can refer to the toy_examples/FR_KL.ipynb and toy_examples/FR_compare.ipynb.

KD Experiments

Please follow the minillm for the environment and dataset.

Introduce the AKL into the KD setting. (Mainly on this line)

And then run the experiments and evaluate the student.

For results on Winogrande, OpenBookQA, BoolQ, ARC, please use this tool.

Contact

Taiqiang Wu: [email protected]

Citation

If you find this paper useful, please cite it by using the following BibTeX entry.

@article{wu2024rethinking,
  title={Rethinking Kullback-Leibler Divergence in Knowledge Distillation for Large Language Models},
  author={Wu, Taiqiang and Tao, Chaofan and Wang, Jiahao and Yang, Runming and Zhao, Zhe and Wong, Ngai},
  journal={arXiv preprint arXiv:2404.02657},
  year={2024}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published