Improving blocked matrix-matrix multiplication routine by utilizing AVX-512 instructions on intel knights landing and xeon scalable processors
- Park, Yoosang; Kim, Raehyun; Nguyen, Thi My Tuyen; Choi, Jaeyoung
- Issue Date
- AVX-512; Intel Skylake-SP; Intel Xeon Phi; Parallel BLAS; Parallel matrix-matrix multiplication; ScaLAPACK
- Cluster Computing
- Journal Title
- Cluster Computing
- In high-performance computing, the general matrix-matrix multiplication (xGEMM) routine is the core of the Level 3 BLAS kernel for effective matrix-matrix multiplication operations. The performance of parallel xGEMM (PxGEMM) is significantly affected by two main factors: the flop rate that can be achieved by calculating the operations and the communication costs for broadcasting submatrices to others. In this study, an approach is proposed to improve and adjust the parallel double-precision general matrix-matrix multiplication (PDGEMM) routine for modern Intel computers such as Knights Landing (KNL) and Xeon Scalable Processors (SKL). The proposed approach consists of two methods to deal with the aforementioned factors. First, the improvement of PDGEMM for the computational part is suggested based on a blocked GEMM algorithm that provides better fits for the architectures of KNL and SKL to perform better block size computation. Second, a communication routine adjustment with the message passing interface is proposed to overcome the settings of the basic linear algebra communication subprograms to improve the time-wise cost efficiency. Consequently, it is shown that performance improvements are achieved in the case of smaller matrix multiplications on the SKL clusters. © 2021, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
- Files in This Item
- There are no files associated with this item.
- Appears in
- College of Information Technology > School of Computer Science and Engineering > 1. Journal Articles
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.