Cited 0 time in
Tailoring CUTLASS GEMM using Supervised Learning
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.author | Yu, Yongseung | - |
| dc.contributor.author | Son, Donghyun | - |
| dc.contributor.author | Lee, Younghyun | - |
| dc.contributor.author | Park, Sunghyun | - |
| dc.contributor.author | Ryu, Giha | - |
| dc.contributor.author | Cho, Myeongjin | - |
| dc.contributor.author | Seo, Jiwon | - |
| dc.contributor.author | Park, Yongjun | - |
| dc.date.accessioned | 2024-11-28T09:31:31Z | - |
| dc.date.available | 2024-11-28T09:31:31Z | - |
| dc.date.issued | 2023-11 | - |
| dc.identifier.issn | 1063-6404 | - |
| dc.identifier.issn | 2576-6996 | - |
| dc.identifier.uri | https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/196070 | - |
| dc.description.abstract | General matrix multiplication (GEMM) is a core computation kernel for deep neural networks. CUTLASS, a state-of-the-art open-source CUDA-based linear-algebra template library, provides a highly optimized tiling-based GEMM. However, CUTLASS GEMM often cannot achieve the optimal performance when its tiling configuration is not appropriately chosen because the performance varies significantly depending on some factors such as the tile size and shape, as well as the target graphics processing unit (GPU) architecture. Thus, determining the optimal tiling configuration is a major challenge in achieving the best performance of a tiling-based GEMM.To address this problem, we propose CUTLASS-tailor, a novel end-to-end framework that predicts the best tile parameters for target CUTLASS GEMM operations and underlying GPUs using a neural network model. We trained the prediction model using a suitable synthetic dataset that includes various input matrix combinations with different sizes and structures. Furthermore, to cover the various GPUs with a universal model, we also included the number of GPU cores and the amount of shared memory as GPU hardware features for the input of the CUTLASS-tailor network. On a test dataset from several real-world GEMMs, CUTLASS-tailor-based GEMM operations outperformed the GEMM operations using cuBLAS by up to 1.94× on an NVIDIA TitanXp GPU, and also showed that CUTLASS-tailor can find better tile parameters than well-known search algorithms. | - |
| dc.format.extent | 10 | - |
| dc.language | 영어 | - |
| dc.language.iso | ENG | - |
| dc.title | Tailoring CUTLASS GEMM using Supervised Learning | - |
| dc.type | Article | - |
| dc.identifier.doi | 10.1109/ICCD58817.2023.00077 | - |
| dc.identifier.scopusid | 2-s2.0-85182319805 | - |
| dc.identifier.wosid | 001146866200066 | - |
| dc.identifier.bibliographicCitation | IEEE International Conference on Computer Design - VLSI in Computers and Processors, pp 465 - 474 | - |
| dc.citation.title | IEEE International Conference on Computer Design - VLSI in Computers and Processors | - |
| dc.citation.startPage | 465 | - |
| dc.citation.endPage | 474 | - |
| dc.type.docType | Proceedings Paper | - |
| dc.description.isOpenAccess | N | - |
| dc.description.journalRegisteredClass | scopus | - |
| dc.relation.journalResearchArea | Computer Science | - |
| dc.relation.journalWebOfScienceCategory | Computer Science, Hardware & Architecture | - |
| dc.relation.journalWebOfScienceCategory | Computer Science, Information Systems | - |
| dc.relation.journalWebOfScienceCategory | Computer Science, Software Engineering | - |
| dc.relation.journalWebOfScienceCategory | Computer Science, Theory & Methods | - |
| dc.subject.keywordPlus | Computer graphics | - |
| dc.subject.keywordPlus | Computer graphics equipment | - |
| dc.subject.keywordPlus | Deep neural networks | - |
| dc.subject.keywordPlus | Matrix algebra | - |
| dc.subject.keywordPlus | Memory architecture | - |
| dc.subject.keywordPlus | Program processors | - |
| dc.subject.keywordPlus | Statistical tests | - |
| dc.subject.keywordPlus | Supervised learning | - |
| dc.subject.keywordAuthor | CUTLASS | - |
| dc.subject.keywordAuthor | General Matrix Multiplication | - |
| dc.subject.keywordAuthor | GPU | - |
| dc.subject.keywordAuthor | Supervised Learning | - |
| dc.identifier.url | https://ieeexplore.ieee.org/document/10360964 | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366
COPYRIGHT © 2024 HANYANG UNIVERSITY.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.
