Cited 0 time in
Teacher Intervention: Improving Convergence of Quantization Aware Training for Ultra-Low Precision Transformers
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.author | Kim, Minsoo | - |
| dc.contributor.author | Shim, Kyuhong | - |
| dc.contributor.author | Park, Seongmin | - |
| dc.contributor.author | Sung, Wonyong | - |
| dc.contributor.author | Choi, Jungwook | - |
| dc.date.accessioned | 2026-06-22T01:30:39Z | - |
| dc.date.available | 2026-06-22T01:30:39Z | - |
| dc.date.issued | 2023-05 | - |
| dc.identifier.uri | https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/213886 | - |
| dc.description.abstract | Pre-trained Transformer models such as BERT have shown great success in a wide range of applications, but at the cost of substantial increases in model complexity. Quantization-aware training (QAT) is a promising method to lower the implementation cost and energy consumption. However, aggressive quantization below 2-bit causes considerable accuracy degradation due to unstable convergence, especially when the downstream dataset is not abundant. This work proposes a proactive knowledge distillation method called Teacher Intervention (TI) for fast converging QAT of ultra-low precision pre-trained Transformers. TI intervenes layer-wise signal propagation with the intact signal from the teacher to remove the interference of propagated quantization errors, smoothing loss surface of QAT and expediting the convergence. Furthermore, we propose a gradual intervention mechanism to stabilize the recovery of subsections of Transformer layers from quantization. The proposed schemes enable fast convergence of QAT and improve the model accuracy regardless of the diverse characteristics of downstream fine-tuning tasks. We demonstrate that TI consistently achieves superior accuracy with significantly lower fine-tuning iterations on well-known Transformers of natural language processing as well as computer vision compared to the state-of-the-art QAT methods. | - |
| dc.format.extent | 14 | - |
| dc.language | 영어 | - |
| dc.language.iso | ENG | - |
| dc.publisher | ASSOC COMPUTATIONAL LINGUISTICS-ACL | - |
| dc.title | Teacher Intervention: Improving Convergence of Quantization Aware Training for Ultra-Low Precision Transformers | - |
| dc.type | Article | - |
| dc.publisher.location | 미국 | - |
| dc.identifier.doi | 10.18653/v1/2023.eacl-main.64 | - |
| dc.identifier.scopusid | 2-s2.0-85159853729 | - |
| dc.identifier.wosid | 001181056903036 | - |
| dc.identifier.bibliographicCitation | 17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, pp 916 - 929 | - |
| dc.citation.title | 17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023 | - |
| dc.citation.startPage | 916 | - |
| dc.citation.endPage | 929 | - |
| dc.type.docType | Proceedings Paper | - |
| dc.description.isOpenAccess | Y | - |
| dc.description.journalRegisteredClass | scie | - |
| dc.description.journalRegisteredClass | scopus | - |
| dc.relation.journalResearchArea | Computer Science | - |
| dc.relation.journalWebOfScienceCategory | Computer Science, Artificial Intelligence | - |
| dc.relation.journalWebOfScienceCategory | Computer Science, Theory & Methods | - |
| dc.subject.keywordPlus | Computational linguistics | - |
| dc.subject.keywordPlus | Distillation | - |
| dc.subject.keywordPlus | Energy utilization | - |
| dc.subject.keywordPlus | Natural language processing systems | - |
| dc.subject.keywordPlus | Quantization (signal) | - |
| dc.identifier.url | https://aclanthology.org/2023.eacl-main.64/ | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366
COPYRIGHT © 2024 HANYANG UNIVERSITY.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.
