Tailoring CUTLASS GEMM using Supervised Learning

Yu, Yongseung; Son, Donghyun; Lee, Younghyun; Park, Sunghyun; Ryu, Giha; Cho, Myeongjin; Seo, Jiwon; Park, Yongjun

doi:10.1109/ICCD58817.2023.00077

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Tailoring CUTLASS GEMM using Supervised Learning

Full metadata record

DC Field	Value	Language
dc.contributor.author	Yu, Yongseung	-
dc.contributor.author	Son, Donghyun	-
dc.contributor.author	Lee, Younghyun	-
dc.contributor.author	Park, Sunghyun	-
dc.contributor.author	Ryu, Giha	-
dc.contributor.author	Cho, Myeongjin	-
dc.contributor.author	Seo, Jiwon	-
dc.contributor.author	Park, Yongjun	-
dc.date.accessioned	2024-11-28T09:31:31Z	-
dc.date.available	2024-11-28T09:31:31Z	-
dc.date.issued	2023-11	-
dc.identifier.issn	1063-6404	-
dc.identifier.issn	2576-6996	-
dc.identifier.uri	https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/196070	-
dc.description.abstract	General matrix multiplication (GEMM) is a core computation kernel for deep neural networks. CUTLASS, a state-of-the-art open-source CUDA-based linear-algebra template library, provides a highly optimized tiling-based GEMM. However, CUTLASS GEMM often cannot achieve the optimal performance when its tiling configuration is not appropriately chosen because the performance varies significantly depending on some factors such as the tile size and shape, as well as the target graphics processing unit (GPU) architecture. Thus, determining the optimal tiling configuration is a major challenge in achieving the best performance of a tiling-based GEMM.To address this problem, we propose CUTLASS-tailor, a novel end-to-end framework that predicts the best tile parameters for target CUTLASS GEMM operations and underlying GPUs using a neural network model. We trained the prediction model using a suitable synthetic dataset that includes various input matrix combinations with different sizes and structures. Furthermore, to cover the various GPUs with a universal model, we also included the number of GPU cores and the amount of shared memory as GPU hardware features for the input of the CUTLASS-tailor network. On a test dataset from several real-world GEMMs, CUTLASS-tailor-based GEMM operations outperformed the GEMM operations using cuBLAS by up to 1.94× on an NVIDIA TitanXp GPU, and also showed that CUTLASS-tailor can find better tile parameters than well-known search algorithms.	-
dc.format.extent	10	-
dc.language	영어	-
dc.language.iso	ENG	-
dc.title	Tailoring CUTLASS GEMM using Supervised Learning	-
dc.type	Article	-
dc.identifier.doi	10.1109/ICCD58817.2023.00077	-
dc.identifier.scopusid	2-s2.0-85182319805	-
dc.identifier.wosid	001146866200066	-
dc.identifier.bibliographicCitation	IEEE International Conference on Computer Design - VLSI in Computers and Processors, pp 465 - 474	-
dc.citation.title	IEEE International Conference on Computer Design - VLSI in Computers and Processors	-
dc.citation.startPage	465	-
dc.citation.endPage	474	-
dc.type.docType	Proceedings Paper	-
dc.description.isOpenAccess	N	-
dc.description.journalRegisteredClass	scopus	-
dc.relation.journalResearchArea	Computer Science	-
dc.relation.journalWebOfScienceCategory	Computer Science, Hardware & Architecture	-
dc.relation.journalWebOfScienceCategory	Computer Science, Information Systems	-
dc.relation.journalWebOfScienceCategory	Computer Science, Software Engineering	-
dc.relation.journalWebOfScienceCategory	Computer Science, Theory & Methods	-
dc.subject.keywordPlus	Computer graphics	-
dc.subject.keywordPlus	Computer graphics equipment	-
dc.subject.keywordPlus	Deep neural networks	-
dc.subject.keywordPlus	Matrix algebra	-
dc.subject.keywordPlus	Memory architecture	-
dc.subject.keywordPlus	Program processors	-
dc.subject.keywordPlus	Statistical tests	-
dc.subject.keywordPlus	Supervised learning	-
dc.subject.keywordAuthor	CUTLASS	-
dc.subject.keywordAuthor	General Matrix Multiplication	-
dc.subject.keywordAuthor	GPU	-
dc.subject.keywordAuthor	Supervised Learning	-
dc.identifier.url	https://ieeexplore.ieee.org/document/10360964	-

Files in This Item: Go to Link

Appears in Collections: 서울 공과대학 > 서울 컴퓨터소프트웨어학부 > 1. Journal Articles

Show simple item record

qrcode

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Altmetrics

Total Views & Downloads

BROWSE