We use cookies to improve your experience with our site.
Yidong Chen, Jun Kai, Yonghua Zhang, Jidong Zhai. A Survey of Quantization in LLM: Unlocking Potential Hardware EfficiencyJ. Journal of Computer Science and Technology. DOI: 10.1007/s11390-026-5979-1
Citation: Yidong Chen, Jun Kai, Yonghua Zhang, Jidong Zhai. A Survey of Quantization in LLM: Unlocking Potential Hardware EfficiencyJ. Journal of Computer Science and Technology. DOI: 10.1007/s11390-026-5979-1

A Survey of Quantization in LLM: Unlocking Potential Hardware Efficiency

  • Large Language Models (LLMs) have achieved remarkable progress in natural language processing, but their immense scale leads to significant computational and storage overheads, limiting their deployment and widespread application in resource-constrained environments. Model quantization, as an effective model compression technique, significantly reduces LLMs' memory footprint and computational requirements by lowering the numerical precision of model parameters and/or activations, while striving to maintain minimal performance loss. This survey aims to comprehensively survey the latest advancements in LLM quantization, covering various techniques from the pre-training phase to the inference phase. We will delve into state of the art quantization during pre-training, post-training quantization and quantization-aware training in quantization fine-tuning, and various quantization methods during inference. Through in-depth analysis of these methods, this survey seeks to provide researchers and engineers with a comprehensive understanding of LLM quantization techniques to identify future research directions and offers insight of how to generate high performance low-precision kernels in different chips.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return