PreScaler: An efficient system-aware precision scaling framework on heterogeneous systems
- Authors
- Kang, Seokwon; Choi, Kyunghwan; Park, Yongjun
- Issue Date
- Feb-2020
- Publisher
- Association for Computing Machinery, Inc
- Keywords
- Compiler; HSA; Precision Scaling; Profile-guided; Runtime
- Citation
- CGO 2020 - Proceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization, pp.280 - 292
- Indexed
- SCOPUS
- Journal Title
- CGO 2020 - Proceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization
- Start Page
- 280
- End Page
- 292
- URI
- https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/146195
- DOI
- 10.1145/3368826.3377917
- Abstract
- Graphics processing units (GPUs) have been commonly utilized to accelerate multiple emerging applications, such as big data processing and machine learning. While GPUs are proven to be effective, approximate computing, to trade off performance with accuracy, is one of the most common solutions for further performance improvement. Precision scaling of originally high-precision values into lower-precision values has recently been the most widely used GPU-side approximation technique, including hardware-level half-precision support. Although several approaches to find optimal mixed-precision configuration of GPU-side kernels have been introduced, total program performance gain is often low because total execution time is the combination of data transfer, type conversion, and kernel execution. As a result, kernel-level scaling may incur high type-conversion overhead of the kernel input/output data. To address this problem, this paper proposes an automatic precision scaling framework called PreScaler that maximizes the program performance at the memory object level by considering whole OpenCL program flows. The main difficulty is that the best configuration cannot be easily predicted due to various application- and system-specific characteristics. PreScaler solves this problem using search space minimization and decision-tree-based search processes. First, it minimizes the number of test configurations based on the information from system inspection and dynamic profiling. Then, it finds the best memory-object level mixed-precision configuration using a decision-tree-based search. PreScaler achieves an average performance gain of 1.33x over the baseline while maintaining the target output quality level.
- Files in This Item
-
Go to Link
- Appears in
Collections - 서울 공과대학 > 서울 컴퓨터소프트웨어학부 > 1. Journal Articles
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.