SpinOut: Enhanced Rotation-based Quantization for LLM by Outlier Injectionopen access
- Authors
- Park, Sangki; Chung, Ki-Seok
- Issue Date
- Feb-2026
- Publisher
- Institute of Electrical and Electronics Engineers Inc.
- Keywords
- Deep learning; LLM; model compression; quantization; Rotation-based quantization
- Citation
- IEEE Access, v.14, pp 24082 - 24095
- Pages
- 14
- Indexed
- SCIE
SCOPUS
- Journal Title
- IEEE Access
- Volume
- 14
- Start Page
- 24082
- End Page
- 24095
- URI
- https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/211352
- DOI
- 10.1109/ACCESS.2026.3664084
- ISSN
- 2169-3536
2169-3536
- Abstract
- Quantization is a crucial technique for deploying Large Language Models (LLMs) in resource-constrained environments. However, minimizing performance degradation due to outliers in activation distributions remains a significant challenge, especially in low-precision quantization. Rotation-based quantization methods have emerged as promising approaches to mitigate outlier effects by transforming the distribution of weights and activations. However, existing methods either suffer from high performance variance due to random rotations or performance degradation when the calibration sample is not sufficient. In this paper, we propose SpinOut, a novel method that enhances rotation-matrix training for LLM quantization by selectively injecting outliers into outlier-sensitive layers. We introduce a method to score layer sensitivity that quantitatively measures each layer’s responsiveness to outliers using Kurtosis and performance metrics, and propose a search algorithm to determine the best subset of layers for outlier injection. By intentionally injecting artificial outliers during training, SpinOut makes rotation matrices more robust to outliers, leading to improved quantization performance. Experimental results on the Llama-2 7B and the Llama-3.2 1B/3B models demonstrate that SpinOut outperforms existing rotation-based quantization methods across various bit configurations. In the W4A4KV4 quantization setting, SpinOut achieves 0.09, 0.83, and 0.12 lower WikiText2 perplexity compared to widely-known SpinQuant, QuaRot, and AMXFP4, respectively, on Llama-2 7B. Furthermore, SpinOut reduces the required number of training samples and the iteration counts by 75% and 50% compared to SpinQuant while achieving a lower performance variance (0.23 vs. 0.3), demonstrating both efficiency and stability. Our method achieves state-of-the-art performance in most experimental settings, including W4A4KV16 quantization, and in the W4A8KV16 configuration, it even surpasses weight-only quantization methods.
- Files in This Item
-
Go to Link
- Appears in
Collections - 서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.