Optimization of GPU-based sparse matrix multiplication for large sparse networks

Lee, Jeongmyung; Kang, Seokwon; Yu, Yongseung; Jo, Yong-Yeon; Kim, Sang-Wook; Park, Yongjun

doi:10.1109/ICDE48307.2020.00085

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Optimization of GPU-based sparse matrix multiplication for large sparse networks

Full metadata record

DC Field	Value	Language
dc.contributor.author	Lee, Jeongmyung	-
dc.contributor.author	Kang, Seokwon	-
dc.contributor.author	Yu, Yongseung	-
dc.contributor.author	Jo, Yong-Yeon	-
dc.contributor.author	Kim, Sang-Wook	-
dc.contributor.author	Park, Yongjun	-
dc.date.accessioned	2022-07-08T06:09:16Z	-
dc.date.available	2022-07-08T06:09:16Z	-
dc.date.created	2021-05-13	-
dc.date.issued	2020-04	-
dc.identifier.issn	1084-4627	-
dc.identifier.uri	https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/145874	-
dc.description.abstract	Sparse matrix multiplication (spGEMM) is widely used to analyze the sparse network data, and extract important information based on matrix representation. As it contains a high degree of data parallelism, many efficient implementations using data-parallel programming platforms such as CUDA and OpenCL have been introduced on graphic processing units (GPUs). Several well-known spGEMM techniques, such as cuS- PARSE and CUSP, often do not utilize the GPU resources fully, owing to the load imbalance between threads in the expansion process and high memory contention in the merge process. Furthermore, even though several outer-product-based spGEMM techniques are proposed to solve the load balancing problem on expansion, they still do not utilize the GPU resources fully, because severe computation load variations exist among the multiple thread blocks.To solve these challenges, this paper proposes a new optimization pass called Block Reorganizer, which balances the total computations of each computing unit on target GPUs, based on the outer-product-based expansion process, and reduces the memory pressure during the merge process. For expansion, it first identifies the actual computation amount for each block, and then performs two thread block transformation processes based on their characteristics: 1) B-Splitting to transform a heavy-computation blocks into multiple small blocks and 2) B- Gathering to aggregate multiple small-computation blocks to a larger block. While merging, it improves the overall performance by performing B-Limiting to limit the number of blocks on each computing unit. Experimental results show that it improves the total performance of kernel execution by 1.43x, on an average, when compared to the row-product-based spGEMM, for NVIDIA Titan Xp GPUs on real-world datasets.	-
dc.language	영어	-
dc.language.iso	en	-
dc.publisher	IEEE Computer Society	-
dc.title	Optimization of GPU-based sparse matrix multiplication for large sparse networks	-
dc.type	Article	-
dc.contributor.affiliatedAuthor	Kim, Sang-Wook	-
dc.contributor.affiliatedAuthor	Park, Yongjun	-
dc.identifier.doi	10.1109/ICDE48307.2020.00085	-
dc.identifier.scopusid	2-s2.0-85085862760	-
dc.identifier.bibliographicCitation	Proceedings - International Conference on Data Engineering, v.2020-April, pp.925 - 936	-
dc.relation.isPartOf	Proceedings - International Conference on Data Engineering	-
dc.citation.title	Proceedings - International Conference on Data Engineering	-
dc.citation.volume	2020-April	-
dc.citation.startPage	925	-
dc.citation.endPage	936	-
dc.type.rims	ART	-
dc.type.docType	Conference Paper	-
dc.description.journalClass	1	-
dc.description.isOpenAccess	N	-
dc.description.journalRegisteredClass	scopus	-
dc.subject.keywordPlus	Balancing	-
dc.subject.keywordPlus	Expansion	-
dc.subject.keywordPlus	Graphics processing unit	-
dc.subject.keywordPlus	Parallel programming	-
dc.subject.keywordPlus	Program processors	-
dc.subject.keywordPlus	Block transformations	-
dc.subject.keywordPlus	Computation blocks	-
dc.subject.keywordPlus	Data-parallel programming	-
dc.subject.keywordPlus	Efficient implementation	-
dc.subject.keywordPlus	Graphic processing units (GPUs)	-
dc.subject.keywordPlus	Load balancing problem	-
dc.subject.keywordPlus	Matrix representation	-
dc.subject.keywordPlus	Real-world datasets	-
dc.subject.keywordPlus	Matrix algebra	-
dc.subject.keywordAuthor	GPU	-
dc.subject.keywordAuthor	Linear algebra	-
dc.subject.keywordAuthor	Sparse matrix multiplication	-
dc.subject.keywordAuthor	Sparse network	-
dc.identifier.url	https://ieeexplore.ieee.org/document/9101654	-

Files in This Item: Go to Link

Appears in Collections: 서울 공과대학 > 서울 컴퓨터소프트웨어학부 > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Park, Yong jun photo

Park, Yong jun: 서울 공과대학 (서울 컴퓨터소프트웨어학부)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1365

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE