PreScaler: An efficient system-aware precision scaling framework on heterogeneous systems

Kang, Seokwon; Choi, Kyunghwan; Park, Yongjun

doi:10.1145/3368826.3377917

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

PreScaler: An efficient system-aware precision scaling framework on heterogeneous systems

Authors: Kang, Seokwon; Choi, Kyunghwan; Park, Yongjun

Issue Date: Feb-2020

Publisher: Association for Computing Machinery, Inc

Keywords: Compiler; HSA; Precision Scaling; Profile-guided; Runtime

Citation: CGO 2020 - Proceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization, pp.280 - 292

Indexed: SCOPUS

Journal Title: CGO 2020 - Proceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization

Start Page: 280

End Page: 292

URI: https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/146195

DOI: 10.1145/3368826.3377917

Abstract: Graphics processing units (GPUs) have been commonly utilized to accelerate multiple emerging applications, such as big data processing and machine learning. While GPUs are proven to be effective, approximate computing, to trade off performance with accuracy, is one of the most common solutions for further performance improvement. Precision scaling of originally high-precision values into lower-precision values has recently been the most widely used GPU-side approximation technique, including hardware-level half-precision support. Although several approaches to find optimal mixed-precision configuration of GPU-side kernels have been introduced, total program performance gain is often low because total execution time is the combination of data transfer, type conversion, and kernel execution. As a result, kernel-level scaling may incur high type-conversion overhead of the kernel input/output data. To address this problem, this paper proposes an automatic precision scaling framework called PreScaler that maximizes the program performance at the memory object level by considering whole OpenCL program flows. The main difficulty is that the best configuration cannot be easily predicted due to various application- and system-specific characteristics. PreScaler solves this problem using search space minimization and decision-tree-based search processes. First, it minimizes the number of test configurations based on the information from system inspection and dynamic profiling. Then, it finds the best memory-object level mixed-precision configuration using a decision-tree-based search. PreScaler achieves an average performance gain of 1.33x over the baseline while maintaining the target output quality level.

Files in This Item: Go to Link

Appears in Collections: 서울 공과대학 > 서울 컴퓨터소프트웨어학부 > 1. Journal Articles

Show full item record

qrcode

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Altmetrics

Total Views & Downloads

BROWSE