Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Optimization of GPU-based sparse matrix multiplication for large sparse networks

Authors
Lee, JeongmyungKang, SeokwonYu, YongseungJo, Yong-YeonKim, Sang-WookPark, Yongjun
Issue Date
Apr-2020
Publisher
IEEE Computer Society
Keywords
GPU; Linear algebra; Sparse matrix multiplication; Sparse network
Citation
Proceedings - International Conference on Data Engineering, v.2020-April, pp.925 - 936
Indexed
SCOPUS
Journal Title
Proceedings - International Conference on Data Engineering
Volume
2020-April
Start Page
925
End Page
936
URI
https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/145874
DOI
10.1109/ICDE48307.2020.00085
ISSN
1084-4627
Abstract
Sparse matrix multiplication (spGEMM) is widely used to analyze the sparse network data, and extract important information based on matrix representation. As it contains a high degree of data parallelism, many efficient implementations using data-parallel programming platforms such as CUDA and OpenCL have been introduced on graphic processing units (GPUs). Several well-known spGEMM techniques, such as cuS- PARSE and CUSP, often do not utilize the GPU resources fully, owing to the load imbalance between threads in the expansion process and high memory contention in the merge process. Furthermore, even though several outer-product-based spGEMM techniques are proposed to solve the load balancing problem on expansion, they still do not utilize the GPU resources fully, because severe computation load variations exist among the multiple thread blocks.To solve these challenges, this paper proposes a new optimization pass called Block Reorganizer, which balances the total computations of each computing unit on target GPUs, based on the outer-product-based expansion process, and reduces the memory pressure during the merge process. For expansion, it first identifies the actual computation amount for each block, and then performs two thread block transformation processes based on their characteristics: 1) B-Splitting to transform a heavy-computation blocks into multiple small blocks and 2) B- Gathering to aggregate multiple small-computation blocks to a larger block. While merging, it improves the overall performance by performing B-Limiting to limit the number of blocks on each computing unit. Experimental results show that it improves the total performance of kernel execution by 1.43x, on an average, when compared to the row-product-based spGEMM, for NVIDIA Titan Xp GPUs on real-world datasets.
Files in This Item
Go to Link
Appears in
Collections
서울 공과대학 > 서울 컴퓨터소프트웨어학부 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Kim, Sang-Wook photo

Kim, Sang-Wook
COLLEGE OF ENGINEERING (SCHOOL OF COMPUTER SCIENCE)
Read more

Altmetrics

Total Views & Downloads

BROWSE