ELF: Maximizing memory-level parallelism for GPUs with coordinated warp and fetch scheduling

Park; J.J.K.; Park, Yongjun; Y.; Mahlke; S.

Detailed Information

Cited 0 time in webofscience

Cited 7 time in scopus

Metadata Downloads

ELF: Maximizing memory-level parallelism for GPUs with coordinated warp and fetch scheduling

Full metadata record

DC Field	Value	Language
dc.contributor.author	Park	-
dc.contributor.author	J.J.K.	-
dc.contributor.author	Park, Yongjun	-
dc.contributor.author	Y.	-
dc.contributor.author	Mahlke	-
dc.contributor.author	S.	-
dc.date.available	2021-03-17T11:41:43Z	-
dc.date.created	2021-02-26	-
dc.date.issued	2015	-
dc.identifier.uri	https://scholarworks.bwise.kr/hongik/handle/2020.sw.hongik/13851	-
dc.description.abstract	Graphics processing units (GPUs) are increasingly utilized as throughput engines in the modern computer systems. GPUs rely on fast context switching between thousands of threads to hide long latency operations, however, they still stall due to the memory operations. To minimize the stalls, memory operations should be overlapped with other operations as much as possible to maximize memory-level parallelism (MLP). In this paper, we propose Earliest Load First (ELF) warp scheduling, which maximizes the MLP by giving higher priority to the warps that have the fewest instructions to the next memory load. ELF utilizes the same warp priority for the fetch scheduling so that both are coordinated. We also show that ELF reveals its full benefits when there are fewer memory conflicts and fetch stalls. Evaluations show that ELF can improve the performance by 4.1% and achieve total improvement of 11.9% when used with other techniques over commonly-used greedy-then-oldest scheduling.	-
dc.publisher	ASSOC COMPUTING MACHINERY	-
dc.title	ELF: Maximizing memory-level parallelism for GPUs with coordinated warp and fetch scheduling	-
dc.type	Article	-
dc.contributor.affiliatedAuthor	Park, Yongjun	-
dc.identifier.doi	10.1145/2807591.2807598	-
dc.identifier.scopusid	2-s2.0-84966642540	-
dc.identifier.wosid	000382162500019	-
dc.identifier.bibliographicCitation	International Conference for High Performance Computing, Networking, Storage and Analysis, SC, v.15-20-November-2015	-
dc.relation.isPartOf	International Conference for High Performance Computing, Networking, Storage and Analysis, SC	-
dc.citation.title	International Conference for High Performance Computing, Networking, Storage and Analysis, SC	-
dc.citation.volume	15-20-November-2015	-
dc.type.rims	ART	-
dc.type.docType	Proceedings Paper	-
dc.description.journalClass	1	-
dc.description.journalRegisteredClass	scopus	-
dc.relation.journalResearchArea	Computer Science	-
dc.relation.journalResearchArea	Engineering	-
dc.relation.journalWebOfScienceCategory	Computer Science, Theory & Methods	-
dc.relation.journalWebOfScienceCategory	Engineering, Electrical & Electronic	-
dc.subject.keywordAuthor	Graphics Processing Unit	-
dc.subject.keywordAuthor	Compiler	-
dc.subject.keywordAuthor	Memory-level Parallelism	-
dc.subject.keywordAuthor	Warp Scheduling	-

Files in This Item: There are no files associated with this item.

Appears in Collections: College of Engineering > School of Electronic & Electrical Engineering > 1. Journal Articles

Show simple item record

qrcode

Altmetrics

Total Views & Downloads

STATISTICS: Total View :2,526,986; Today View :865

RSS_1.0 RSS_2.0 ATOM_1.0

94, Wausan-ro, Mapo-gu, Seoul, 04066, Korea02-320-1314

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Altmetrics

Total Views & Downloads

BROWSE