NN-LUT: Neural Approximation of Non-Linear Operations for Efficient Transformer Inference
- Authors
- Yu, Joonsang; Park, Junki; Park, Seongmin; Kim, Minsoo; Lee, Sihwa; Lee, Dong Hyun; Choi, Jungwook
- Issue Date
- Jul-2022
- Publisher
- Institute of Electrical and Electronics Engineers Inc.
- Keywords
- look-up table; neural network; non-linear function; transformer
- Citation
- Proceedings - Design Automation Conference, pp 577 - 582
- Pages
- 6
- Indexed
- SCOPUS
- Journal Title
- Proceedings - Design Automation Conference
- Start Page
- 577
- End Page
- 582
- URI
- https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/172605
- DOI
- 10.1145/3489517.3530505
- ISSN
- 0146-7123
- Abstract
- Non-linear operations such as GELU, Layer normalization, and Soft-max are essential yet costly building blocks of Transformer models. Several prior works simplified these operations with look-up tables or integer computations, but such approximations suffer inferior accuracy or considerable hardware cost with long latency. This paper proposes an accurate and hardware-friendly approximation framework for efficient Transformer inference. Our framework employs a simple neural network as a universal approximator with its structure equivalently transformed into a Look-up table(LUT). The proposed framework called Neural network generated LUT(NN-LUT) can accurately replace all the non-linear operations in popular BERT models with significant reductions in area, power consumption, and latency.
- Files in This Item
-
Go to Link
- Appears in
Collections - 서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles
![qrcode](https://api.qrserver.com/v1/create-qr-code/?size=55x55&data=https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/172605)
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.