APRES: Improving Cache Efficiency by Exploiting Load Characteristics on GPUs

Oh, Yunho; Kim, Keunsoo; Yoon, Myung-kuk; Park, Jong-hyun; Park, Yongjun; Ro, Won Woo; Annavaram, Murali

doi:10.1109/ISCA.2016.26

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

APRES: Improving Cache Efficiency by Exploiting Load Characteristics on GPUs

Full metadata record

DC Field	Value	Language
dc.contributor.author	Oh, Yunho	-
dc.contributor.author	Kim, Keunsoo	-
dc.contributor.author	Yoon, Myung-kuk	-
dc.contributor.author	Park, Jong-hyun	-
dc.contributor.author	Park, Yongjun	-
dc.contributor.author	Ro, Won Woo	-
dc.contributor.author	Annavaram, Murali	-
dc.date.accessioned	2022-07-15T16:01:59Z	-
dc.date.available	2022-07-15T16:01:59Z	-
dc.date.created	2021-05-14	-
dc.date.issued	2016-06	-
dc.identifier.issn	1063-6897	-
dc.identifier.uri	https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/154394	-
dc.description.abstract	Long memory latency and limited throughput become performance bottlenecks of GPGPU applications. The latency takes hundreds of cycles which is difficult to be hidden by simply interleaving tens of warp execution. While cache hierarchy helps to reduce memory system pressure, massive Thread-Level Parallelism (TLP) often causes excessive cache contention. This paper proposes Adaptive PREfetching and Scheduling (APRES) to improve GPU cache efficiency. APRES relies on the following observations. First, certain static load instructions tend to generate memory addresses having very high locality. Second, although loads have no locality, the access addresses still can show highly strided access pattern. Third, the locality behavior tends to be consistent regardless of warp ID. APRES schedules warps so that as many cache hits generated as possible before any cache misses generated. This is to minimize cache thrashing when many warps are contending for a cache line. However, to realize this operation, it is required to predict which warp will hit the cache in the near future. Without directly predicting future cache hit/miss for each warp, APRES creates a group of warps that will execute the same load instruction in the near future. Based on the third observation, we expect the locality behavior is consistent over all warps in the group. If the first executed warp in the group hits the cache, then the load is considered as a high locality type, and APRES prioritizes all warps in the group. Group prioritization leads to consecutive cache hits, because the grouped warps are likely to access the same cache line. If the first warp missed the cache, then the load is considered as a strided type, and APRES generates prefetch requests for the other warps in the group. After that, APRES prioritizes prefetch targeted warps so that the demand requests are merged to Miss Status Holding Register (MSHR) or prefetched lines can be accessed. On memory-intensive applications, APRES achieves 31.7% performance improvement compared to the baseline GPU and 7.2% additional speedup compared to the best combination of existing warp scheduling and prefetching methods.	-
dc.language	영어	-
dc.language.iso	en	-
dc.publisher	IEEE	-
dc.title	APRES: Improving Cache Efficiency by Exploiting Load Characteristics on GPUs	-
dc.type	Article	-
dc.contributor.affiliatedAuthor	Park, Yongjun	-
dc.identifier.doi	10.1109/ISCA.2016.26	-
dc.identifier.scopusid	2-s2.0-84988422355	-
dc.identifier.bibliographicCitation	International Symposium on Computer Architecture, pp.191 - 203	-
dc.relation.isPartOf	International Symposium on Computer Architecture	-
dc.citation.title	International Symposium on Computer Architecture	-
dc.citation.startPage	191	-
dc.citation.endPage	203	-
dc.type.rims	ART	-
dc.type.docType	Proceeding	-
dc.description.journalClass	1	-
dc.description.isOpenAccess	N	-
dc.description.journalRegisteredClass	scopus	-
dc.subject.keywordPlus	Computer architecture	-
dc.subject.keywordPlus	Efficiency	-
dc.subject.keywordPlus	Memory architecture	-
dc.subject.keywordPlus	Multitasking	-
dc.subject.keywordPlus	Program processors	-
dc.subject.keywordPlus	Scheduling	-
dc.subject.keywordAuthor	GPGPU	-
dc.subject.keywordAuthor	Warp Scheduling	-
dc.subject.keywordAuthor	Data Prefetching	-
dc.identifier.url	https://ieeexplore.ieee.org/document/7551393	-

Files in This Item: Go to Link

Appears in Collections: 서울 공과대학 > 서울 컴퓨터소프트웨어학부 > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Park, Yong jun photo

Park, Yong jun: 서울 공과대학 (서울 컴퓨터소프트웨어학부)

Read more

Altmetrics

Total Views & Downloads

STATISTICS: Total View :6,068,479; Today View :1,067

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1365

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE