An implementation of matrix–matrix multiplication on the Intel KNL processor with AVX-512

Lim, R.; Lee, Y.; Kim, R.; Choi, J.

doi:10.1007/s10586-018-2810-y

Detailed Information

Cited 7 time in webofscience

Cited 13 time in scopus

Metadata Downloads

An implementation of matrix–matrix multiplication on the Intel KNL processor with AVX-512

Full metadata record

DC Field	Value	Language
dc.contributor.author	Lim, R.	-
dc.contributor.author	Lee, Y.	-
dc.contributor.author	Kim, R.	-
dc.contributor.author	Choi, J.	-
dc.date.available	2019-03-13T01:35:03Z	-
dc.date.created	2018-09-12	-
dc.date.issued	2018-06	-
dc.identifier.issn	1386-7857	-
dc.identifier.uri	http://scholarworks.bwise.kr/ssu/handle/2018.sw.ssu/31309	-
dc.description.abstract	The second generation Intel Xeon Phi processor codenamed Knights Landing (KNL) have recently emerged with 2D tile mesh architecture and the Intel AVX-512 instructions. However, it is very difficult for general users to get the maximum performance from the new architecture since they are not familiar with optimal cache reuse, efficient vectorization, and assembly language. In this paper, we illustrate several developing strategies to achieve good performance with C programming language by carrying out general matrix–matrix multiplications and without the use of assembly language. Our implementation of matrix–matrix multiplication is based on blocked matrix multiplication as an optimization technique that improves data reuse. We use data prefetching, loop unrolling, and the Intel AVX-512 to optimize the blocked matrix multiplications. When we use a single core of the KNL, our implementation achieves up to 98% of SGEMM and 99% of DGEMM using the Intel MKL, which is the current state-of-the-art library. Our implementation of the parallel DGEMM using all 68 cores of the KNL achieves up to 90% of DGEMM using the Intel MKL.	-
dc.language	영어	-
dc.language.iso	en	-
dc.publisher	Springer New York LLC	-
dc.relation.isPartOf	Cluster Computing	-
dc.subject	C (programming language)	-
dc.subject	Assembly language	-
dc.subject	AVX-512	-
dc.subject	Developing strategy	-
dc.subject	MAtrix multiplication	-
dc.subject	Optimization techniques	-
dc.subject	Second generation	-
dc.subject	Threading	-
dc.subject	Vectorization	-
dc.subject	Matrix algebra	-
dc.title	An implementation of matrix–matrix multiplication on the Intel KNL processor with AVX-512	-
dc.type	Article	-
dc.identifier.doi	10.1007/s10586-018-2810-y	-
dc.type.rims	ART	-
dc.identifier.bibliographicCitation	Cluster Computing, v.21, no.4, pp.1785 - 1795	-
dc.description.journalClass	1	-
dc.identifier.wosid	000457276800003	-
dc.identifier.scopusid	2-s2.0-85047909063	-
dc.citation.endPage	1795	-
dc.citation.number	4	-
dc.citation.startPage	1785	-
dc.citation.title	Cluster Computing	-
dc.citation.volume	21	-
dc.contributor.affiliatedAuthor	Choi, J.	-
dc.type.docType	Article	-
dc.description.isOpenAccess	N	-
dc.subject.keywordAuthor	Matrix-matrix multiplication	-
dc.subject.keywordAuthor	Knights Landing	-
dc.subject.keywordAuthor	AVX-512	-
dc.subject.keywordAuthor	Vectorization	-
dc.subject.keywordAuthor	Threading	-
dc.description.journalRegisteredClass	scie	-

Files in This Item: There are no files associated with this item.

Appears in Collections: College of Information Technology > School of Computer Science and Engineering > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Choi, Jaeyoung photo

Choi, Jaeyoung: College of Information Technology (School of Computer Science and Engineering)

Read more

Altmetrics

Total Views & Downloads

STATISTICS: Total View :8,382,510; Today View :31

RSS_1.0 RSS_2.0 ATOM_1.0

Soongsil University Library 369 Sangdo-Ro, Dongjak-Gu, Seoul, Korea (06978)02-820-0733

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE