Optimizing Exponent Bias for Sub-8bit Floating-Point Inference of Fine-tuned Transformers
- Authors
- 이장환; Choi, Jung wook
- Issue Date
- Jun-2022
- Publisher
- Institute of Electrical and Electronics Engineers Inc.
- Keywords
- BERT; exponent bias; floating-point; post-training quantization; reduced-precision; SQNR; Transformer
- Citation
- Proceeding - IEEE International Conference on Artificial Intelligence Circuits and Systems, AICAS 2022, pp 98 - 101
- Pages
- 4
- Indexed
- SCOPUS
- Journal Title
- Proceeding - IEEE International Conference on Artificial Intelligence Circuits and Systems, AICAS 2022
- Start Page
- 98
- End Page
- 101
- URI
- https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/173245
- DOI
- 10.1109/AICAS54282.2022.9869965
- Abstract
- The Transformer-based fine-tuned neural networks have demonstrated remarkable success in natural language processing (NLP) at the cost of a substantial computational burden. Post-training quantization (PTQ) is a promising technique to reduce the computational cost without expensive re-training. But prior works either demand complex calibration or suffer noticeable accuracy degradation. This paper proposes a practical method for sub-8bit floating-point (FP) PTQ. The proposed method optimizes the exponent bias to minimize quantization error in terms of signal-to-quantization noise ratio (SQNR) progressively like stochastic gradient descent. We evaluate that the proposed method achieves close to full-precision model accuracy for 6 to 8 bit FP PTQ of fine-tuned BERT on GLUE and SQuAD tasks with negligible run-time overhead.
- Files in This Item
-
Go to Link
- Appears in
Collections - 서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.