Teacher Intervention: Improving Convergence of Quantization Aware Training for Ultra-Low Precision Transformers

Kim, Minsoo; Shim, Kyuhong; Park, Seongmin; Sung, Wonyong; Choi, Jungwook

doi:10.18653/v1/2023.eacl-main.64

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Teacher Intervention: Improving Convergence of Quantization Aware Training for Ultra-Low Precision Transformers

Full metadata record

DC Field	Value	Language
dc.contributor.author	Kim, Minsoo	-
dc.contributor.author	Shim, Kyuhong	-
dc.contributor.author	Park, Seongmin	-
dc.contributor.author	Sung, Wonyong	-
dc.contributor.author	Choi, Jungwook	-
dc.date.accessioned	2026-06-22T01:30:39Z	-
dc.date.available	2026-06-22T01:30:39Z	-
dc.date.issued	2023-05	-
dc.identifier.uri	https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/213886	-
dc.description.abstract	Pre-trained Transformer models such as BERT have shown great success in a wide range of applications, but at the cost of substantial increases in model complexity. Quantization-aware training (QAT) is a promising method to lower the implementation cost and energy consumption. However, aggressive quantization below 2-bit causes considerable accuracy degradation due to unstable convergence, especially when the downstream dataset is not abundant. This work proposes a proactive knowledge distillation method called Teacher Intervention (TI) for fast converging QAT of ultra-low precision pre-trained Transformers. TI intervenes layer-wise signal propagation with the intact signal from the teacher to remove the interference of propagated quantization errors, smoothing loss surface of QAT and expediting the convergence. Furthermore, we propose a gradual intervention mechanism to stabilize the recovery of subsections of Transformer layers from quantization. The proposed schemes enable fast convergence of QAT and improve the model accuracy regardless of the diverse characteristics of downstream fine-tuning tasks. We demonstrate that TI consistently achieves superior accuracy with significantly lower fine-tuning iterations on well-known Transformers of natural language processing as well as computer vision compared to the state-of-the-art QAT methods.	-
dc.format.extent	14	-
dc.language	영어	-
dc.language.iso	ENG	-
dc.publisher	ASSOC COMPUTATIONAL LINGUISTICS-ACL	-
dc.title	Teacher Intervention: Improving Convergence of Quantization Aware Training for Ultra-Low Precision Transformers	-
dc.type	Article	-
dc.publisher.location	미국	-
dc.identifier.doi	10.18653/v1/2023.eacl-main.64	-
dc.identifier.scopusid	2-s2.0-85159853729	-
dc.identifier.wosid	001181056903036	-
dc.identifier.bibliographicCitation	17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, pp 916 - 929	-
dc.citation.title	17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023	-
dc.citation.startPage	916	-
dc.citation.endPage	929	-
dc.type.docType	Proceedings Paper	-
dc.description.isOpenAccess	Y	-
dc.description.journalRegisteredClass	scie	-
dc.description.journalRegisteredClass	scopus	-
dc.relation.journalResearchArea	Computer Science	-
dc.relation.journalWebOfScienceCategory	Computer Science, Artificial Intelligence	-
dc.relation.journalWebOfScienceCategory	Computer Science, Theory & Methods	-
dc.subject.keywordPlus	Computational linguistics	-
dc.subject.keywordPlus	Distillation	-
dc.subject.keywordPlus	Energy utilization	-
dc.subject.keywordPlus	Natural language processing systems	-
dc.subject.keywordPlus	Quantization (signal)	-
dc.identifier.url	https://aclanthology.org/2023.eacl-main.64/	-

Files in This Item: Go to Link

Appears in Collections: 서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Choi, Jung wook photo

Choi, Jung wook: COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE