Understanding and Optimizing INT4 Convolution for Accelerated DNN Inference on Tensor Cores

Choi, Junkyeong; Kwon, Hyucksung; Lee, Woongkyu; Lim, Jieun; Choi, Jungwook

doi:10.1109/SiPS55645.2022.9919243

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Understanding and Optimizing INT4 Convolution for Accelerated DNN Inference on Tensor Cores

Authors: Choi, Junkyeong; Kwon, Hyucksung; Lee, Woongkyu; Lim, Jieun; Choi, Jungwook

Issue Date: Nov-2022

Publisher: Institute of Electrical and Electronics Engineers Inc.

Keywords: convolution; reduced precision DNN; tensor core

Citation: IEEE Workshop on Signal Processing Systems, SiPS: Design and Implementation, v.2022-November, pp.1 - 6

Indexed: SCOPUS

Journal Title: IEEE Workshop on Signal Processing Systems, SiPS: Design and Implementation

Volume: 2022-November

Start Page: 1

End Page: 6

URI: https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/172849

DOI: 10.1109/SiPS55645.2022.9919243

ISSN: 1520-6130

Abstract: Convolution is one of the fundamental operations of deep neural networks with demanding matrix computation. In a graphic processing unit (GPU), Tensor Core is a specialized matrix processing hardware equipped with reduced-precision warp matrix-multiply-accumulate (WMMA) instructions to increase throughput. However, it is challenging to achieve optimal performance since the reduced-precision WMMA requires many elements grouped as a matrix operand, seriously limiting data reuse and imposing packing and layout overhead on the schedule. This work proposes three techniques to enhance INT4 WMMA utilization on Tensor Cores: duplicate-aware load for increasing the reuse of convolution input, register-level packing for alleviating overhead of handling INT4 data, and data layout optimization for coalesced data transfer. The proposed INT4 WMMA optimization techniques are evaluated on convolution operations of popular neural networks to demonstrate substantial speedup on Tensor Core compared to the state of the art.

Files in This Item: Go to Link

Appears in Collections: 서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

Show full item record

qrcode

Related Researcher

Researcher Choi, Jung wook photo

Choi, Jung wook: COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)

Read more

Altmetrics

Total Views & Downloads

STATISTICS: Total View :6,024,045; Today View :50,870

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1365

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE