Evaluating performance of Parallel Matrix Multiplication Routine on Intel KNL and Xeon Scalable Processors

Nguyen, T.M.T.; Park, Y.; Choi, J.; Kim, R.

doi:10.1109/ACSOS-C51401.2020.00027

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Evaluating performance of Parallel Matrix Multiplication Routine on Intel KNL and Xeon Scalable Processors

Authors: Nguyen, T.M.T.; Park, Y.; Choi, J.; Kim, R.

Issue Date: Sep-2020

Publisher: Institute of Electrical and Electronics Engineers Inc.

Keywords: AVX-512; Intel Knights Landing; Intel Xeon Scalable; Parallel BLAS; Parallel matrix-matrix multiplication; ScaLAPACK

Citation: Proceedings - 2020 IEEE International Conference on Autonomic Computing and Self-Organizing Systems Companion, ACSOS-C 2020, pp.42 - 47

Journal Title: Proceedings - 2020 IEEE International Conference on Autonomic Computing and Self-Organizing Systems Companion, ACSOS-C 2020

Start Page: 42

End Page: 47

URI: http://scholarworks.bwise.kr/ssu/handle/2018.sw.ssu/39829

DOI: 10.1109/ACSOS-C51401.2020.00027

ISSN: 0000-0000

Abstract: In high-performance computing, xGEMM routine is the core of Level 3 BLAS operation to achieve matrix-matrix multiplications. The performance of Parallel xGEMM (PxGEMM) is significantly affected by two major factors: Firstly, the flop rate that can be achieved by calculating matrix-matrix multiplication on each node. Secondly, communication costs for broadcasting sub-matrices to others. In this paper, an approach to improve and adjust PDGEMM routine for modern Intel computers: Knights Landing (KNL) and Xeon Scalable Processors (SKL) is proposed. This approach consists of two methods to deal with the factors mentioned above. First, the improvement of PDGEMM for the computation part is suggested based on a blocked matrix-matrix multiplication algorithm by providing better fits for architectures of KNL and SKL to deliver a better block size computation. Second, a communication routine with MPI is proposed to overcome default settings of BLACS which is a part of communication, to improve a time-wise cost efficiency. The proposed PDGEMM achieves similar performance on smaller size matrices as PDGEMM from ScaLAPACK and Intel MKL on 16 node Intel KNL. Furthermore, the proposed PDGEMM achieves better performance (on smaller size matrices) compared to PDGEMM from ScaLAPACK and Intel MKL on 16 nodes Xeon scalable processors. © 2020 IEEE.

Files in This Item: There are no files associated with this item.

Appears in Collections: College of Information Technology > School of Computer Science and Engineering > 1. Journal Articles

Show full item record

qrcode

Related Researcher

Researcher Choi, Jaeyoung photo

Choi, Jaeyoung: College of Information Technology (School of Computer Science and Engineering)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

Soongsil University Library 369 Sangdo-Ro, Dongjak-Gu, Seoul, Korea (06978)02-820-0733

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE