Understanding and Optimizing INT4 Convolution for Accelerated DNN Inference on Tensor Cores
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Choi, Junkyeong | - |
dc.contributor.author | Kwon, Hyucksung | - |
dc.contributor.author | Lee, Woongkyu | - |
dc.contributor.author | Lim, Jieun | - |
dc.contributor.author | Choi, Jungwook | - |
dc.date.accessioned | 2022-12-20T05:05:00Z | - |
dc.date.available | 2022-12-20T05:05:00Z | - |
dc.date.created | 2022-12-07 | - |
dc.date.issued | 2022-11 | - |
dc.identifier.issn | 1520-6130 | - |
dc.identifier.uri | https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/172849 | - |
dc.description.abstract | Convolution is one of the fundamental operations of deep neural networks with demanding matrix computation. In a graphic processing unit (GPU), Tensor Core is a specialized matrix processing hardware equipped with reduced-precision warp matrix-multiply-accumulate (WMMA) instructions to increase throughput. However, it is challenging to achieve optimal performance since the reduced-precision WMMA requires many elements grouped as a matrix operand, seriously limiting data reuse and imposing packing and layout overhead on the schedule. This work proposes three techniques to enhance INT4 WMMA utilization on Tensor Cores: duplicate-aware load for increasing the reuse of convolution input, register-level packing for alleviating overhead of handling INT4 data, and data layout optimization for coalesced data transfer. The proposed INT4 WMMA optimization techniques are evaluated on convolution operations of popular neural networks to demonstrate substantial speedup on Tensor Core compared to the state of the art. | - |
dc.language | 영어 | - |
dc.language.iso | en | - |
dc.publisher | Institute of Electrical and Electronics Engineers Inc. | - |
dc.title | Understanding and Optimizing INT4 Convolution for Accelerated DNN Inference on Tensor Cores | - |
dc.type | Article | - |
dc.contributor.affiliatedAuthor | Choi, Jungwook | - |
dc.identifier.doi | 10.1109/SiPS55645.2022.9919243 | - |
dc.identifier.scopusid | 2-s2.0-85141793386 | - |
dc.identifier.wosid | 001081960800004 | - |
dc.identifier.bibliographicCitation | IEEE Workshop on Signal Processing Systems, SiPS: Design and Implementation, v.2022-November, pp.1 - 6 | - |
dc.relation.isPartOf | IEEE Workshop on Signal Processing Systems, SiPS: Design and Implementation | - |
dc.citation.title | IEEE Workshop on Signal Processing Systems, SiPS: Design and Implementation | - |
dc.citation.volume | 2022-November | - |
dc.citation.startPage | 1 | - |
dc.citation.endPage | 6 | - |
dc.type.rims | ART | - |
dc.type.docType | Proceedings Paper | - |
dc.description.journalClass | 1 | - |
dc.description.isOpenAccess | N | - |
dc.description.journalRegisteredClass | scopus | - |
dc.relation.journalResearchArea | Computer Science | - |
dc.relation.journalResearchArea | Engineering | - |
dc.relation.journalResearchArea | Telecommunications | - |
dc.relation.journalWebOfScienceCategory | Computer Science, Information Systems | - |
dc.relation.journalWebOfScienceCategory | Engineering, Electrical & Electronic | - |
dc.relation.journalWebOfScienceCategory | Telecommunications | - |
dc.subject.keywordPlus | Data reduction | - |
dc.subject.keywordPlus | Data transfer | - |
dc.subject.keywordPlus | Deep neural networks | - |
dc.subject.keywordPlus | Graphics processing unit | - |
dc.subject.keywordPlus | Matrix algebra | - |
dc.subject.keywordPlus | Tensors | - |
dc.subject.keywordPlus | Convolution | - |
dc.subject.keywordPlus | Fundamental operations | - |
dc.subject.keywordPlus | matrix | - |
dc.subject.keywordPlus | Matrix computation | - |
dc.subject.keywordPlus | Matrix multiply | - |
dc.subject.keywordPlus | Multiplyaccumulate (MAC) | - |
dc.subject.keywordPlus | Processing hardware | - |
dc.subject.keywordPlus | Reduced precision | - |
dc.subject.keywordPlus | Reduced precision DNN | - |
dc.subject.keywordPlus | Tensor core | - |
dc.subject.keywordPlus | Unit tensor | - |
dc.subject.keywordAuthor | convolution | - |
dc.subject.keywordAuthor | reduced precision DNN | - |
dc.subject.keywordAuthor | tensor core | - |
dc.identifier.url | https://ieeexplore.ieee.org/document/9919243 | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1365
COPYRIGHT © 2021 HANYANG UNIVERSITY.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.