Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Tailoring CUTLASS GEMM using Supervised Learning

Full metadata record
DC Field Value Language
dc.contributor.authorYu, Yongseung-
dc.contributor.authorSon, Donghyun-
dc.contributor.authorLee, Younghyun-
dc.contributor.authorPark, Sunghyun-
dc.contributor.authorRyu, Giha-
dc.contributor.authorCho, Myeongjin-
dc.contributor.authorSeo, Jiwon-
dc.contributor.authorPark, Yongjun-
dc.date.accessioned2024-11-28T09:31:31Z-
dc.date.available2024-11-28T09:31:31Z-
dc.date.issued2023-11-
dc.identifier.issn1063-6404-
dc.identifier.issn2576-6996-
dc.identifier.urihttps://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/196070-
dc.description.abstractGeneral matrix multiplication (GEMM) is a core computation kernel for deep neural networks. CUTLASS, a state-of-the-art open-source CUDA-based linear-algebra template library, provides a highly optimized tiling-based GEMM. However, CUTLASS GEMM often cannot achieve the optimal performance when its tiling configuration is not appropriately chosen because the performance varies significantly depending on some factors such as the tile size and shape, as well as the target graphics processing unit (GPU) architecture. Thus, determining the optimal tiling configuration is a major challenge in achieving the best performance of a tiling-based GEMM.To address this problem, we propose CUTLASS-tailor, a novel end-to-end framework that predicts the best tile parameters for target CUTLASS GEMM operations and underlying GPUs using a neural network model. We trained the prediction model using a suitable synthetic dataset that includes various input matrix combinations with different sizes and structures. Furthermore, to cover the various GPUs with a universal model, we also included the number of GPU cores and the amount of shared memory as GPU hardware features for the input of the CUTLASS-tailor network. On a test dataset from several real-world GEMMs, CUTLASS-tailor-based GEMM operations outperformed the GEMM operations using cuBLAS by up to 1.94× on an NVIDIA TitanXp GPU, and also showed that CUTLASS-tailor can find better tile parameters than well-known search algorithms.-
dc.format.extent10-
dc.language영어-
dc.language.isoENG-
dc.titleTailoring CUTLASS GEMM using Supervised Learning-
dc.typeArticle-
dc.identifier.doi10.1109/ICCD58817.2023.00077-
dc.identifier.scopusid2-s2.0-85182319805-
dc.identifier.wosid001146866200066-
dc.identifier.bibliographicCitationIEEE International Conference on Computer Design - VLSI in Computers and Processors, pp 465 - 474-
dc.citation.titleIEEE International Conference on Computer Design - VLSI in Computers and Processors-
dc.citation.startPage465-
dc.citation.endPage474-
dc.type.docTypeProceedings Paper-
dc.description.isOpenAccessN-
dc.description.journalRegisteredClassscopus-
dc.relation.journalResearchAreaComputer Science-
dc.relation.journalWebOfScienceCategoryComputer Science, Hardware & Architecture-
dc.relation.journalWebOfScienceCategoryComputer Science, Information Systems-
dc.relation.journalWebOfScienceCategoryComputer Science, Software Engineering-
dc.relation.journalWebOfScienceCategoryComputer Science, Theory & Methods-
dc.subject.keywordPlusComputer graphics-
dc.subject.keywordPlusComputer graphics equipment-
dc.subject.keywordPlusDeep neural networks-
dc.subject.keywordPlusMatrix algebra-
dc.subject.keywordPlusMemory architecture-
dc.subject.keywordPlusProgram processors-
dc.subject.keywordPlusStatistical tests-
dc.subject.keywordPlusSupervised learning-
dc.subject.keywordAuthorCUTLASS-
dc.subject.keywordAuthorGeneral Matrix Multiplication-
dc.subject.keywordAuthorGPU-
dc.subject.keywordAuthorSupervised Learning-
dc.identifier.urlhttps://ieeexplore.ieee.org/document/10360964-
Files in This Item
Go to Link
Appears in
Collections
서울 공과대학 > 서울 컴퓨터소프트웨어학부 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Altmetrics

Total Views & Downloads

BROWSE