Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

PreScaler: An efficient system-aware precision scaling framework on heterogeneous systems

Authors
Kang, SeokwonChoi, KyunghwanPark, Yongjun
Issue Date
Feb-2020
Publisher
Association for Computing Machinery, Inc
Keywords
Compiler; HSA; Precision Scaling; Profile-guided; Runtime
Citation
CGO 2020 - Proceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization, pp.280 - 292
Indexed
SCOPUS
Journal Title
CGO 2020 - Proceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization
Start Page
280
End Page
292
URI
https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/146195
DOI
10.1145/3368826.3377917
Abstract
Graphics processing units (GPUs) have been commonly utilized to accelerate multiple emerging applications, such as big data processing and machine learning. While GPUs are proven to be effective, approximate computing, to trade off performance with accuracy, is one of the most common solutions for further performance improvement. Precision scaling of originally high-precision values into lower-precision values has recently been the most widely used GPU-side approximation technique, including hardware-level half-precision support. Although several approaches to find optimal mixed-precision configuration of GPU-side kernels have been introduced, total program performance gain is often low because total execution time is the combination of data transfer, type conversion, and kernel execution. As a result, kernel-level scaling may incur high type-conversion overhead of the kernel input/output data. To address this problem, this paper proposes an automatic precision scaling framework called PreScaler that maximizes the program performance at the memory object level by considering whole OpenCL program flows. The main difficulty is that the best configuration cannot be easily predicted due to various application- and system-specific characteristics. PreScaler solves this problem using search space minimization and decision-tree-based search processes. First, it minimizes the number of test configurations based on the information from system inspection and dynamic profiling. Then, it finds the best memory-object level mixed-precision configuration using a decision-tree-based search. PreScaler achieves an average performance gain of 1.33x over the baseline while maintaining the target output quality level.
Files in This Item
Go to Link
Appears in
Collections
서울 공과대학 > 서울 컴퓨터소프트웨어학부 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Park, Yong jun photo

Park, Yong jun
서울 공과대학 (서울 컴퓨터소프트웨어학부)
Read more

Altmetrics

Total Views & Downloads

BROWSE