Detailed Information

Cited 7 time in webofscience Cited 13 time in scopus
Metadata Downloads

An implementation of matrix–matrix multiplication on the Intel KNL processor with AVX-512

Full metadata record
DC Field Value Language
dc.contributor.authorLim, R.-
dc.contributor.authorLee, Y.-
dc.contributor.authorKim, R.-
dc.contributor.authorChoi, J.-
dc.date.available2019-03-13T01:35:03Z-
dc.date.created2018-09-12-
dc.date.issued2018-06-
dc.identifier.issn1386-7857-
dc.identifier.urihttp://scholarworks.bwise.kr/ssu/handle/2018.sw.ssu/31309-
dc.description.abstractThe second generation Intel Xeon Phi processor codenamed Knights Landing (KNL) have recently emerged with 2D tile mesh architecture and the Intel AVX-512 instructions. However, it is very difficult for general users to get the maximum performance from the new architecture since they are not familiar with optimal cache reuse, efficient vectorization, and assembly language. In this paper, we illustrate several developing strategies to achieve good performance with C programming language by carrying out general matrix–matrix multiplications and without the use of assembly language. Our implementation of matrix–matrix multiplication is based on blocked matrix multiplication as an optimization technique that improves data reuse. We use data prefetching, loop unrolling, and the Intel AVX-512 to optimize the blocked matrix multiplications. When we use a single core of the KNL, our implementation achieves up to 98% of SGEMM and 99% of DGEMM using the Intel MKL, which is the current state-of-the-art library. Our implementation of the parallel DGEMM using all 68 cores of the KNL achieves up to 90% of DGEMM using the Intel MKL.-
dc.language영어-
dc.language.isoen-
dc.publisherSpringer New York LLC-
dc.relation.isPartOfCluster Computing-
dc.subjectC (programming language)-
dc.subjectAssembly language-
dc.subjectAVX-512-
dc.subjectDeveloping strategy-
dc.subjectMAtrix multiplication-
dc.subjectOptimization techniques-
dc.subjectSecond generation-
dc.subjectThreading-
dc.subjectVectorization-
dc.subjectMatrix algebra-
dc.titleAn implementation of matrix–matrix multiplication on the Intel KNL processor with AVX-512-
dc.typeArticle-
dc.identifier.doi10.1007/s10586-018-2810-y-
dc.type.rimsART-
dc.identifier.bibliographicCitationCluster Computing, v.21, no.4, pp.1785 - 1795-
dc.description.journalClass1-
dc.identifier.wosid000457276800003-
dc.identifier.scopusid2-s2.0-85047909063-
dc.citation.endPage1795-
dc.citation.number4-
dc.citation.startPage1785-
dc.citation.titleCluster Computing-
dc.citation.volume21-
dc.contributor.affiliatedAuthorChoi, J.-
dc.type.docTypeArticle-
dc.description.isOpenAccessN-
dc.subject.keywordAuthorMatrix-matrix multiplication-
dc.subject.keywordAuthorKnights Landing-
dc.subject.keywordAuthorAVX-512-
dc.subject.keywordAuthorVectorization-
dc.subject.keywordAuthorThreading-
dc.description.journalRegisteredClassscie-
Files in This Item
There are no files associated with this item.
Appears in
Collections
College of Information Technology > School of Computer Science and Engineering > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Choi, Jaeyoung photo

Choi, Jaeyoung
College of Information Technology (School of Computer Science and Engineering)
Read more

Altmetrics

Total Views & Downloads

BROWSE