Optimizing Exponent Bias for Sub-8bit Floating-Point Inference of Fine-tuned Transformers

이장환; Choi, Jung wook

doi:10.1109/AICAS54282.2022.9869965

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Optimizing Exponent Bias for Sub-8bit Floating-Point Inference of Fine-tuned Transformers

Authors: 이장환; Choi, Jung wook

Issue Date: Jun-2022

Publisher: Institute of Electrical and Electronics Engineers Inc.

Keywords: BERT; exponent bias; floating-point; post-training quantization; reduced-precision; SQNR; Transformer

Citation: Proceeding - IEEE International Conference on Artificial Intelligence Circuits and Systems, AICAS 2022, pp 98 - 101

Pages: 4

Indexed: SCOPUS

Journal Title: Proceeding - IEEE International Conference on Artificial Intelligence Circuits and Systems, AICAS 2022

Start Page: 98

End Page: 101

URI: https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/173245

DOI: 10.1109/AICAS54282.2022.9869965

Abstract: The Transformer-based fine-tuned neural networks have demonstrated remarkable success in natural language processing (NLP) at the cost of a substantial computational burden. Post-training quantization (PTQ) is a promising technique to reduce the computational cost without expensive re-training. But prior works either demand complex calibration or suffer noticeable accuracy degradation. This paper proposes a practical method for sub-8bit floating-point (FP) PTQ. The proposed method optimizes the exponent bias to minimize quantization error in terms of signal-to-quantization noise ratio (SQNR) progressively like stochastic gradient descent. We evaluate that the proposed method achieves close to full-precision model accuracy for 6 to 8 bit FP PTQ of fine-tuned BERT on GLUE and SQuAD tasks with negligible run-time overhead.

Files in This Item: Go to Link

Appears in Collections: 서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

Show full item record

qrcode

Related Researcher

Researcher Choi, Jung wook photo

Choi, Jung wook: COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE