An implementation of matrix–matrix multiplication on the Intel KNL processor with AVX-512
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Lim, R. | - |
dc.contributor.author | Lee, Y. | - |
dc.contributor.author | Kim, R. | - |
dc.contributor.author | Choi, J. | - |
dc.date.available | 2019-03-13T01:35:03Z | - |
dc.date.created | 2018-09-12 | - |
dc.date.issued | 2018-06 | - |
dc.identifier.issn | 1386-7857 | - |
dc.identifier.uri | http://scholarworks.bwise.kr/ssu/handle/2018.sw.ssu/31309 | - |
dc.description.abstract | The second generation Intel Xeon Phi processor codenamed Knights Landing (KNL) have recently emerged with 2D tile mesh architecture and the Intel AVX-512 instructions. However, it is very difficult for general users to get the maximum performance from the new architecture since they are not familiar with optimal cache reuse, efficient vectorization, and assembly language. In this paper, we illustrate several developing strategies to achieve good performance with C programming language by carrying out general matrix–matrix multiplications and without the use of assembly language. Our implementation of matrix–matrix multiplication is based on blocked matrix multiplication as an optimization technique that improves data reuse. We use data prefetching, loop unrolling, and the Intel AVX-512 to optimize the blocked matrix multiplications. When we use a single core of the KNL, our implementation achieves up to 98% of SGEMM and 99% of DGEMM using the Intel MKL, which is the current state-of-the-art library. Our implementation of the parallel DGEMM using all 68 cores of the KNL achieves up to 90% of DGEMM using the Intel MKL. | - |
dc.language | 영어 | - |
dc.language.iso | en | - |
dc.publisher | Springer New York LLC | - |
dc.relation.isPartOf | Cluster Computing | - |
dc.subject | C (programming language) | - |
dc.subject | Assembly language | - |
dc.subject | AVX-512 | - |
dc.subject | Developing strategy | - |
dc.subject | MAtrix multiplication | - |
dc.subject | Optimization techniques | - |
dc.subject | Second generation | - |
dc.subject | Threading | - |
dc.subject | Vectorization | - |
dc.subject | Matrix algebra | - |
dc.title | An implementation of matrix–matrix multiplication on the Intel KNL processor with AVX-512 | - |
dc.type | Article | - |
dc.identifier.doi | 10.1007/s10586-018-2810-y | - |
dc.type.rims | ART | - |
dc.identifier.bibliographicCitation | Cluster Computing, v.21, no.4, pp.1785 - 1795 | - |
dc.description.journalClass | 1 | - |
dc.identifier.wosid | 000457276800003 | - |
dc.identifier.scopusid | 2-s2.0-85047909063 | - |
dc.citation.endPage | 1795 | - |
dc.citation.number | 4 | - |
dc.citation.startPage | 1785 | - |
dc.citation.title | Cluster Computing | - |
dc.citation.volume | 21 | - |
dc.contributor.affiliatedAuthor | Choi, J. | - |
dc.type.docType | Article | - |
dc.description.isOpenAccess | N | - |
dc.subject.keywordAuthor | Matrix-matrix multiplication | - |
dc.subject.keywordAuthor | Knights Landing | - |
dc.subject.keywordAuthor | AVX-512 | - |
dc.subject.keywordAuthor | Vectorization | - |
dc.subject.keywordAuthor | Threading | - |
dc.description.journalRegisteredClass | scie | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
Soongsil University Library 369 Sangdo-Ro, Dongjak-Gu, Seoul, Korea (06978)02-820-0733
COPYRIGHT ⓒ SOONGSIL UNIVERSITY, ALL RIGHTS RESERVED.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.