Understanding and Optimizing INT4 Convolution for Accelerated DNN Inference on Tensor Cores
- Authors
- Choi, Junkyeong; Kwon, Hyucksung; Lee, Woongkyu; Lim, Jieun; Choi, Jungwook
- Issue Date
- Nov-2022
- Publisher
- Institute of Electrical and Electronics Engineers Inc.
- Keywords
- convolution; reduced precision DNN; tensor core
- Citation
- IEEE Workshop on Signal Processing Systems, SiPS: Design and Implementation, v.2022-November, pp.1 - 6
- Indexed
- SCOPUS
- Journal Title
- IEEE Workshop on Signal Processing Systems, SiPS: Design and Implementation
- Volume
- 2022-November
- Start Page
- 1
- End Page
- 6
- URI
- https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/172849
- DOI
- 10.1109/SiPS55645.2022.9919243
- ISSN
- 1520-6130
- Abstract
- Convolution is one of the fundamental operations of deep neural networks with demanding matrix computation. In a graphic processing unit (GPU), Tensor Core is a specialized matrix processing hardware equipped with reduced-precision warp matrix-multiply-accumulate (WMMA) instructions to increase throughput. However, it is challenging to achieve optimal performance since the reduced-precision WMMA requires many elements grouped as a matrix operand, seriously limiting data reuse and imposing packing and layout overhead on the schedule. This work proposes three techniques to enhance INT4 WMMA utilization on Tensor Cores: duplicate-aware load for increasing the reuse of convolution input, register-level packing for alleviating overhead of handling INT4 data, and data layout optimization for coalesced data transfer. The proposed INT4 WMMA optimization techniques are evaluated on convolution operations of popular neural networks to demonstrate substantial speedup on Tensor Core compared to the state of the art.
- Files in This Item
-
Go to Link
- Appears in
Collections - 서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles
![qrcode](https://api.qrserver.com/v1/create-qr-code/?size=55x55&data=https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/172849)
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.