Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Orchestrating Large-Scale SpGEMMs using Dynamic Block Distribution and Data Transfer Minimization on Heterogeneous Systems

Full metadata record
DC Field Value Language
dc.contributor.authorPark, Taehyeong-
dc.contributor.authorKang, Seokwon-
dc.contributor.authorJang, Myung-Hwan-
dc.contributor.authorKim, Sang-Wook-
dc.contributor.authorPark, Yongjun-
dc.date.accessioned2023-08-22T01:30:19Z-
dc.date.available2023-08-22T01:30:19Z-
dc.date.issued2023-04-
dc.identifier.issn1084-4627-
dc.identifier.urihttps://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/189393-
dc.description.abstractSparse general matrix-matrix multiplication (SpGEMM) is a major kernel in various emerging applications, such as database management systems, deep learning, graph analysis, and recommendation systems. Since SpGEMM requires extensive computation, many SpGEMM techniques have been implemented based on graphics processing units (GPUs) to exploit massive data parallelism completely. However, traditional SpGEMM techniques usually do not fully utilize the GPU because most non-zero elements of the target sparse matrices exist in a few hub nodes, and non-hub nodes barely have non-zero elements. The data-related characteristics (power law) result in a significant degradation in performance because of the load imbalance between the GPU cores and the low utilization of each core. Many attempts have been made through recent implementations to solve this problem using smart pre-/post-processing. However, the net performance hardly improves and sometimes even deteriorates owing to the large overheads. Additionally, non-hub nodes are inherently not suitable for GPU computing, even after optimization. Furthermore, the performance is no longer dominated by kernel execution, but by data transfers such as device-to-host (D2H) data transfers and file I/Os, owing to the rapid growth in the computing power of GPUs and input data size.Therefore, this work proposes a Dynamic Block Distributor (DBD), a novel full-system-level SpGEMM orchestration framework for heterogeneous systems, improving the overall performance by enabling an efficient CPU-GPU collaboration and further minimizing the overhead in data transfer between all the system elements. This framework first divides the target matrix into smaller blocks and then offloads the computation of each block to an appropriate computing unit between a GPU and CPU based on its workload type and the status of resource utilization at runtime. It also minimizes the overhead in data transfer with simple but suitable techniques, such as Row Collecting, I/O Overlapping, and I/O Binding. Our experiments showed that this framework increased the execution latency of SpGEMM, which included both the kernel execution and D2H transfers, by 3.24x on average, and the overall execution time by 2.07x on average, compared to that of the baseline cuSPARSE library.-
dc.format.extent4-
dc.language영어-
dc.language.isoENG-
dc.publisherIEEE Computer Society-
dc.titleOrchestrating Large-Scale SpGEMMs using Dynamic Block Distribution and Data Transfer Minimization on Heterogeneous Systems-
dc.typeArticle-
dc.publisher.location미국-
dc.identifier.doi10.1109/ICDE55515.2023.00189-
dc.identifier.scopusid2-s2.0-85167662508-
dc.identifier.bibliographicCitationProceedings - International Conference on Data Engineering, v.2023-April, pp 2456 - 2459-
dc.citation.titleProceedings - International Conference on Data Engineering-
dc.citation.volume2023-April-
dc.citation.startPage2456-
dc.citation.endPage2459-
dc.type.docTypeConference paper-
dc.description.isOpenAccessN-
dc.description.journalRegisteredClassscopus-
dc.subject.keywordPlusComputer graphics-
dc.subject.keywordPlusComputing power-
dc.subject.keywordPlusData transfer-
dc.subject.keywordPlusDeep learning-
dc.subject.keywordPlusElectric power distribution-
dc.subject.keywordPlusMatrix algebra-
dc.subject.keywordPlusProgram processors-
dc.subject.keywordPlusGraphics processing unit-
dc.subject.keywordPlusHeterogeneous-
dc.subject.keywordPlusHeterogeneous systems-
dc.subject.keywordPlusHub nodes-
dc.subject.keywordPlusLarge scale sparse matrix-
dc.subject.keywordPlusLarge-scales-
dc.subject.keywordPlusMAtrix multiplication-
dc.subject.keywordPlusMatrix-matrix multiplications-
dc.subject.keywordPlusPerformance-
dc.subject.keywordPlusSparse matrices-
dc.subject.keywordPlusSparse matrix multiplication-
dc.subject.keywordAuthorGPU-
dc.subject.keywordAuthorheterogeneous-
dc.subject.keywordAuthorlarge-scale sparse matrix-
dc.subject.keywordAuthorSparse matrix multiplication-
dc.identifier.urlhttps://ieeexplore.ieee.org/document/10184530-
Files in This Item
Go to Link
Appears in
Collections
서울 공과대학 > 서울 컴퓨터소프트웨어학부 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Kim, Sang-Wook photo

Kim, Sang-Wook
COLLEGE OF ENGINEERING (SCHOOL OF COMPUTER SCIENCE)
Read more

Altmetrics

Total Views & Downloads

BROWSE