Orchestrating Large-Scale SpGEMMs using Dynamic Block Distribution and Data Transfer Minimization on Heterogeneous Systems

Park, Taehyeong; Kang, Seokwon; Jang, Myung-Hwan; Kim, Sang-Wook; Park, Yongjun

doi:10.1109/ICDE55515.2023.00189

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Orchestrating Large-Scale SpGEMMs using Dynamic Block Distribution and Data Transfer Minimization on Heterogeneous Systems

Full metadata record

DC Field	Value	Language
dc.contributor.author	Park, Taehyeong	-
dc.contributor.author	Kang, Seokwon	-
dc.contributor.author	Jang, Myung-Hwan	-
dc.contributor.author	Kim, Sang-Wook	-
dc.contributor.author	Park, Yongjun	-
dc.date.accessioned	2023-08-22T01:30:19Z	-
dc.date.available	2023-08-22T01:30:19Z	-
dc.date.issued	2023-04	-
dc.identifier.issn	1084-4627	-
dc.identifier.uri	https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/189393	-
dc.description.abstract	Sparse general matrix-matrix multiplication (SpGEMM) is a major kernel in various emerging applications, such as database management systems, deep learning, graph analysis, and recommendation systems. Since SpGEMM requires extensive computation, many SpGEMM techniques have been implemented based on graphics processing units (GPUs) to exploit massive data parallelism completely. However, traditional SpGEMM techniques usually do not fully utilize the GPU because most non-zero elements of the target sparse matrices exist in a few hub nodes, and non-hub nodes barely have non-zero elements. The data-related characteristics (power law) result in a significant degradation in performance because of the load imbalance between the GPU cores and the low utilization of each core. Many attempts have been made through recent implementations to solve this problem using smart pre-/post-processing. However, the net performance hardly improves and sometimes even deteriorates owing to the large overheads. Additionally, non-hub nodes are inherently not suitable for GPU computing, even after optimization. Furthermore, the performance is no longer dominated by kernel execution, but by data transfers such as device-to-host (D2H) data transfers and file I/Os, owing to the rapid growth in the computing power of GPUs and input data size.Therefore, this work proposes a Dynamic Block Distributor (DBD), a novel full-system-level SpGEMM orchestration framework for heterogeneous systems, improving the overall performance by enabling an efficient CPU-GPU collaboration and further minimizing the overhead in data transfer between all the system elements. This framework first divides the target matrix into smaller blocks and then offloads the computation of each block to an appropriate computing unit between a GPU and CPU based on its workload type and the status of resource utilization at runtime. It also minimizes the overhead in data transfer with simple but suitable techniques, such as Row Collecting, I/O Overlapping, and I/O Binding. Our experiments showed that this framework increased the execution latency of SpGEMM, which included both the kernel execution and D2H transfers, by 3.24x on average, and the overall execution time by 2.07x on average, compared to that of the baseline cuSPARSE library.	-
dc.format.extent	4	-
dc.language	영어	-
dc.language.iso	ENG	-
dc.publisher	IEEE Computer Society	-
dc.title	Orchestrating Large-Scale SpGEMMs using Dynamic Block Distribution and Data Transfer Minimization on Heterogeneous Systems	-
dc.type	Article	-
dc.publisher.location	미국	-
dc.identifier.doi	10.1109/ICDE55515.2023.00189	-
dc.identifier.scopusid	2-s2.0-85167662508	-
dc.identifier.bibliographicCitation	Proceedings - International Conference on Data Engineering, v.2023-April, pp 2456 - 2459	-
dc.citation.title	Proceedings - International Conference on Data Engineering	-
dc.citation.volume	2023-April	-
dc.citation.startPage	2456	-
dc.citation.endPage	2459	-
dc.type.docType	Conference paper	-
dc.description.isOpenAccess	N	-
dc.description.journalRegisteredClass	scopus	-
dc.subject.keywordPlus	Computer graphics	-
dc.subject.keywordPlus	Computing power	-
dc.subject.keywordPlus	Data transfer	-
dc.subject.keywordPlus	Deep learning	-
dc.subject.keywordPlus	Electric power distribution	-
dc.subject.keywordPlus	Matrix algebra	-
dc.subject.keywordPlus	Program processors	-
dc.subject.keywordPlus	Graphics processing unit	-
dc.subject.keywordPlus	Heterogeneous	-
dc.subject.keywordPlus	Heterogeneous systems	-
dc.subject.keywordPlus	Hub nodes	-
dc.subject.keywordPlus	Large scale sparse matrix	-
dc.subject.keywordPlus	Large-scales	-
dc.subject.keywordPlus	MAtrix multiplication	-
dc.subject.keywordPlus	Matrix-matrix multiplications	-
dc.subject.keywordPlus	Performance	-
dc.subject.keywordPlus	Sparse matrices	-
dc.subject.keywordPlus	Sparse matrix multiplication	-
dc.subject.keywordAuthor	GPU	-
dc.subject.keywordAuthor	heterogeneous	-
dc.subject.keywordAuthor	large-scale sparse matrix	-
dc.subject.keywordAuthor	Sparse matrix multiplication	-
dc.identifier.url	https://ieeexplore.ieee.org/document/10184530	-

Files in This Item: Go to Link

Appears in Collections: 서울 공과대학 > 서울 컴퓨터소프트웨어학부 > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Kim, Sang-Wook photo

Kim, Sang-Wook: COLLEGE OF ENGINEERING (SCHOOL OF COMPUTER SCIENCE)

Read more

Altmetrics

Total Views & Downloads

STATISTICS: Total View :6,001,741; Today View :28,096

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1365

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE