Cited 0 time in
SpinOut: Enhanced Rotation-based Quantization for LLM by Outlier Injection
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.author | Park, Sangki | - |
| dc.contributor.author | Chung, Ki-Seok | - |
| dc.date.accessioned | 2026-03-18T06:00:46Z | - |
| dc.date.available | 2026-03-18T06:00:46Z | - |
| dc.date.issued | 2026-02 | - |
| dc.identifier.issn | 2169-3536 | - |
| dc.identifier.issn | 2169-3536 | - |
| dc.identifier.uri | https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/211352 | - |
| dc.description.abstract | Quantization is a crucial technique for deploying Large Language Models (LLMs) in resource-constrained environments. However, minimizing performance degradation due to outliers in activation distributions remains a significant challenge, especially in low-precision quantization. Rotation-based quantization methods have emerged as promising approaches to mitigate outlier effects by transforming the distribution of weights and activations. However, existing methods either suffer from high performance variance due to random rotations or performance degradation when the calibration sample is not sufficient. In this paper, we propose SpinOut, a novel method that enhances rotation-matrix training for LLM quantization by selectively injecting outliers into outlier-sensitive layers. We introduce a method to score layer sensitivity that quantitatively measures each layer’s responsiveness to outliers using Kurtosis and performance metrics, and propose a search algorithm to determine the best subset of layers for outlier injection. By intentionally injecting artificial outliers during training, SpinOut makes rotation matrices more robust to outliers, leading to improved quantization performance. Experimental results on the Llama-2 7B and the Llama-3.2 1B/3B models demonstrate that SpinOut outperforms existing rotation-based quantization methods across various bit configurations. In the W4A4KV4 quantization setting, SpinOut achieves 0.09, 0.83, and 0.12 lower WikiText2 perplexity compared to widely-known SpinQuant, QuaRot, and AMXFP4, respectively, on Llama-2 7B. Furthermore, SpinOut reduces the required number of training samples and the iteration counts by 75% and 50% compared to SpinQuant while achieving a lower performance variance (0.23 vs. 0.3), demonstrating both efficiency and stability. Our method achieves state-of-the-art performance in most experimental settings, including W4A4KV16 quantization, and in the W4A8KV16 configuration, it even surpasses weight-only quantization methods. | - |
| dc.format.extent | 14 | - |
| dc.language | 영어 | - |
| dc.language.iso | ENG | - |
| dc.publisher | Institute of Electrical and Electronics Engineers Inc. | - |
| dc.title | SpinOut: Enhanced Rotation-based Quantization for LLM by Outlier Injection | - |
| dc.type | Article | - |
| dc.publisher.location | 미국 | - |
| dc.identifier.doi | 10.1109/ACCESS.2026.3664084 | - |
| dc.identifier.scopusid | 2-s2.0-105030591734 | - |
| dc.identifier.wosid | 001694347900007 | - |
| dc.identifier.bibliographicCitation | IEEE Access, v.14, pp 24082 - 24095 | - |
| dc.citation.title | IEEE Access | - |
| dc.citation.volume | 14 | - |
| dc.citation.startPage | 24082 | - |
| dc.citation.endPage | 24095 | - |
| dc.type.docType | Article in press | - |
| dc.description.isOpenAccess | Y | - |
| dc.description.journalRegisteredClass | scie | - |
| dc.description.journalRegisteredClass | scopus | - |
| dc.relation.journalResearchArea | Computer Science | - |
| dc.relation.journalResearchArea | Engineering | - |
| dc.relation.journalResearchArea | Telecommunications | - |
| dc.relation.journalWebOfScienceCategory | Computer Science, Information Systems | - |
| dc.relation.journalWebOfScienceCategory | Engineering, Electrical & Electronic | - |
| dc.relation.journalWebOfScienceCategory | Telecommunications | - |
| dc.subject.keywordPlus | Chemical activation | - |
| dc.subject.keywordPlus | Higher order statistics | - |
| dc.subject.keywordPlus | Iterative methods | - |
| dc.subject.keywordPlus | Matrix algebra | - |
| dc.subject.keywordPlus | Rotation | - |
| dc.subject.keywordPlus | Sampling | - |
| dc.subject.keywordPlus | Statistics | - |
| dc.subject.keywordAuthor | Deep learning | - |
| dc.subject.keywordAuthor | LLM | - |
| dc.subject.keywordAuthor | model compression | - |
| dc.subject.keywordAuthor | quantization | - |
| dc.subject.keywordAuthor | Rotation-based quantization | - |
| dc.identifier.url | https://ieeexplore.ieee.org/document/11394731 | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366
COPYRIGHT © 2024 HANYANG UNIVERSITY.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.
