Understanding and Optimizing INT4 Convolution for Accelerated DNN Inference on Tensor Cores

Choi, Junkyeong; Kwon, Hyucksung; Lee, Woongkyu; Lim, Jieun; Choi, Jungwook

doi:10.1109/SiPS55645.2022.9919243

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Understanding and Optimizing INT4 Convolution for Accelerated DNN Inference on Tensor Cores

Full metadata record

DC Field	Value	Language
dc.contributor.author	Choi, Junkyeong	-
dc.contributor.author	Kwon, Hyucksung	-
dc.contributor.author	Lee, Woongkyu	-
dc.contributor.author	Lim, Jieun	-
dc.contributor.author	Choi, Jungwook	-
dc.date.accessioned	2022-12-20T05:05:00Z	-
dc.date.available	2022-12-20T05:05:00Z	-
dc.date.created	2022-12-07	-
dc.date.issued	2022-11	-
dc.identifier.issn	1520-6130	-
dc.identifier.uri	https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/172849	-
dc.description.abstract	Convolution is one of the fundamental operations of deep neural networks with demanding matrix computation. In a graphic processing unit (GPU), Tensor Core is a specialized matrix processing hardware equipped with reduced-precision warp matrix-multiply-accumulate (WMMA) instructions to increase throughput. However, it is challenging to achieve optimal performance since the reduced-precision WMMA requires many elements grouped as a matrix operand, seriously limiting data reuse and imposing packing and layout overhead on the schedule. This work proposes three techniques to enhance INT4 WMMA utilization on Tensor Cores: duplicate-aware load for increasing the reuse of convolution input, register-level packing for alleviating overhead of handling INT4 data, and data layout optimization for coalesced data transfer. The proposed INT4 WMMA optimization techniques are evaluated on convolution operations of popular neural networks to demonstrate substantial speedup on Tensor Core compared to the state of the art.	-
dc.language	영어	-
dc.language.iso	en	-
dc.publisher	Institute of Electrical and Electronics Engineers Inc.	-
dc.title	Understanding and Optimizing INT4 Convolution for Accelerated DNN Inference on Tensor Cores	-
dc.type	Article	-
dc.contributor.affiliatedAuthor	Choi, Jungwook	-
dc.identifier.doi	10.1109/SiPS55645.2022.9919243	-
dc.identifier.scopusid	2-s2.0-85141793386	-
dc.identifier.wosid	001081960800004	-
dc.identifier.bibliographicCitation	IEEE Workshop on Signal Processing Systems, SiPS: Design and Implementation, v.2022-November, pp.1 - 6	-
dc.relation.isPartOf	IEEE Workshop on Signal Processing Systems, SiPS: Design and Implementation	-
dc.citation.title	IEEE Workshop on Signal Processing Systems, SiPS: Design and Implementation	-
dc.citation.volume	2022-November	-
dc.citation.startPage	1	-
dc.citation.endPage	6	-
dc.type.rims	ART	-
dc.type.docType	Proceedings Paper	-
dc.description.journalClass	1	-
dc.description.isOpenAccess	N	-
dc.description.journalRegisteredClass	scopus	-
dc.relation.journalResearchArea	Computer Science	-
dc.relation.journalResearchArea	Engineering	-
dc.relation.journalResearchArea	Telecommunications	-
dc.relation.journalWebOfScienceCategory	Computer Science, Information Systems	-
dc.relation.journalWebOfScienceCategory	Engineering, Electrical & Electronic	-
dc.relation.journalWebOfScienceCategory	Telecommunications	-
dc.subject.keywordPlus	Data reduction	-
dc.subject.keywordPlus	Data transfer	-
dc.subject.keywordPlus	Deep neural networks	-
dc.subject.keywordPlus	Graphics processing unit	-
dc.subject.keywordPlus	Matrix algebra	-
dc.subject.keywordPlus	Tensors	-
dc.subject.keywordPlus	Convolution	-
dc.subject.keywordPlus	Fundamental operations	-
dc.subject.keywordPlus	matrix	-
dc.subject.keywordPlus	Matrix computation	-
dc.subject.keywordPlus	Matrix multiply	-
dc.subject.keywordPlus	Multiplyaccumulate (MAC)	-
dc.subject.keywordPlus	Processing hardware	-
dc.subject.keywordPlus	Reduced precision	-
dc.subject.keywordPlus	Reduced precision DNN	-
dc.subject.keywordPlus	Tensor core	-
dc.subject.keywordPlus	Unit tensor	-
dc.subject.keywordAuthor	convolution	-
dc.subject.keywordAuthor	reduced precision DNN	-
dc.subject.keywordAuthor	tensor core	-
dc.identifier.url	https://ieeexplore.ieee.org/document/9919243	-

Files in This Item: Go to Link

Appears in Collections: 서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Choi, Jung wook photo

Choi, Jung wook: COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)

Read more

Altmetrics

Total Views & Downloads

STATISTICS: Total View :6,026,982; Today View :55

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1365

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE