Mitigating Quantization Errors Due to Activation Spikes in Gated Linear Unit-Based Large Language Models

Yang, Jaewoo; Kim, Hayun; Ji, Junyung; Kim, Younghoon

doi:10.3390/fi17040185

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Mitigating Quantization Errors Due to Activation Spikes in Gated Linear Unit-Based Large Language Models

Full metadata record

DC Field	Value	Language
dc.contributor.author	Yang, Jaewoo	-
dc.contributor.author	Kim, Hayun	-
dc.contributor.author	Ji, Junyung	-
dc.contributor.author	Kim, Younghoon	-
dc.date.accessioned	2025-05-16T07:30:43Z	-
dc.date.available	2025-05-16T07:30:43Z	-
dc.date.issued	2025-04	-
dc.identifier.issn	1999-5903	-
dc.identifier.issn	1999-5903	-
dc.identifier.uri	https://scholarworks.bwise.kr/erica/handle/2021.sw.erica/125234	-
dc.description.abstract	Modern large language models (LLMs) achieve state-of-the-art performance through architectural advancements but require high computational costs for inference. Post-training quantization is a widely adopted approach to reduce these costs by quantizing weights and activations to lower precision, such as INT8. However, we identify a critical challenge in activation quantization for GLU (Gated Linear Unit) variants, which are commonly used in the feed-forward networks of modern LLMs like the LLaMA family. Specifically, severe local quantization errors arise due to excessively large activation magnitudes, which we refer to as activation spikes, leading to significant degradation in model performance. Our analysis reveals a systematic pattern of these spikes: they predominantly occur in the FFN (feed-forward network) layers at the early and late layers of the model and are concentrated on a small subset of tokens rather than being uniformly distributed across a token sequence. To mitigate this issue, we propose two empirical methods: Quantization-free Module (QFeM) and Quantization-free Prefix (QFeP), which isolate activation spikes during quantization. Extensive experiments demonstrated that our methods effectively improve activation quantization, particularly in coarse-grained quantization schemes, enhancing the performance of LLMs with GLU variants and addressing the limitations of existing quantization techniques. The code for implementing our methods and reproducing the experiments is publicly available our GitHub repository.	-
dc.format.extent	21	-
dc.language	영어	-
dc.language.iso	ENG	-
dc.publisher	MDPI	-
dc.title	Mitigating Quantization Errors Due to Activation Spikes in Gated Linear Unit-Based Large Language Models	-
dc.type	Article	-
dc.publisher.location	스위스	-
dc.identifier.doi	10.3390/fi17040185	-
dc.identifier.scopusid	2-s2.0-105003621632	-
dc.identifier.wosid	001475065400001	-
dc.identifier.bibliographicCitation	FUTURE INTERNET, v.17, no.4, pp 1 - 21	-
dc.citation.title	FUTURE INTERNET	-
dc.citation.volume	17	-
dc.citation.number	4	-
dc.citation.startPage	1	-
dc.citation.endPage	21	-
dc.type.docType	Article	-
dc.description.isOpenAccess	Y	-
dc.description.journalRegisteredClass	scopus	-
dc.description.journalRegisteredClass	esci	-
dc.relation.journalResearchArea	Computer Science	-
dc.relation.journalWebOfScienceCategory	Computer Science, Information Systems	-
dc.subject.keywordAuthor	quantization	-
dc.subject.keywordAuthor	LLM	-
dc.subject.keywordAuthor	post-training quantization	-
dc.subject.keywordAuthor	outliers	-
dc.identifier.url	https://www.mdpi.com/1999-5903/17/4/185	-

Files in This Item: Go to Link

Appears in Collections: COLLEGE OF COMPUTING > DEPARTMENT OF ARTIFICIAL INTELLIGENCE > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Kim, Young hoon photo

Kim, Young hoon: ERICA 소프트웨어융합대학 (DEPARTMENT OF ARTIFICIAL INTELLIGENCE)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

55 Hanyangdeahak-ro, Sangnok-gu, Ansan, Gyeonggi-do, 15588, Korea+82-31-400-4269 sweetbrain@hanyang.ac.kr

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE