Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

SpinOut: Enhanced Rotation-based Quantization for LLM by Outlier Injectionopen access

Authors
Park, SangkiChung, Ki-Seok
Issue Date
Feb-2026
Publisher
Institute of Electrical and Electronics Engineers Inc.
Keywords
Deep learning; LLM; model compression; quantization; Rotation-based quantization
Citation
IEEE Access, v.14, pp 24082 - 24095
Pages
14
Indexed
SCIE
SCOPUS
Journal Title
IEEE Access
Volume
14
Start Page
24082
End Page
24095
URI
https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/211352
DOI
10.1109/ACCESS.2026.3664084
ISSN
2169-3536
2169-3536
Abstract
Quantization is a crucial technique for deploying Large Language Models (LLMs) in resource-constrained environments. However, minimizing performance degradation due to outliers in activation distributions remains a significant challenge, especially in low-precision quantization. Rotation-based quantization methods have emerged as promising approaches to mitigate outlier effects by transforming the distribution of weights and activations. However, existing methods either suffer from high performance variance due to random rotations or performance degradation when the calibration sample is not sufficient. In this paper, we propose SpinOut, a novel method that enhances rotation-matrix training for LLM quantization by selectively injecting outliers into outlier-sensitive layers. We introduce a method to score layer sensitivity that quantitatively measures each layer’s responsiveness to outliers using Kurtosis and performance metrics, and propose a search algorithm to determine the best subset of layers for outlier injection. By intentionally injecting artificial outliers during training, SpinOut makes rotation matrices more robust to outliers, leading to improved quantization performance. Experimental results on the Llama-2 7B and the Llama-3.2 1B/3B models demonstrate that SpinOut outperforms existing rotation-based quantization methods across various bit configurations. In the W4A4KV4 quantization setting, SpinOut achieves 0.09, 0.83, and 0.12 lower WikiText2 perplexity compared to widely-known SpinQuant, QuaRot, and AMXFP4, respectively, on Llama-2 7B. Furthermore, SpinOut reduces the required number of training samples and the iteration counts by 75% and 50% compared to SpinQuant while achieving a lower performance variance (0.23 vs. 0.3), demonstrating both efficiency and stability. Our method achieves state-of-the-art performance in most experimental settings, including W4A4KV16 quantization, and in the W4A8KV16 configuration, it even surpasses weight-only quantization methods.
Files in This Item
Go to Link
Appears in
Collections
서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Chung, Ki Seok photo

Chung, Ki Seok
COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)
Read more

Altmetrics

Total Views & Downloads

BROWSE