A collaborative cpu vector offloader: Putting idle vector resources to work on commodity processors

Son, Youngbin; Kang, Seokwon; Um, Hongjun; Lee,  Seokho; Ham, Jonghyun; Kim, Donghyeon; Park, Yongjun

doi:10.3390/electronics10232960

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

A collaborative cpu vector offloader: Putting idle vector resources to work on commodity processorsopen access

Authors: Son, Youngbin; Kang, Seokwon; Um, Hongjun; Lee, Seokho; Ham, Jonghyun; Kim, Donghyeon; Park, Yongjun

Issue Date: Dec-2021

Publisher: MDPI

Keywords: vector processors; job offloading; resource utilization; data parallelism; heterogeneous system architectures

Citation: ELECTRONICS, v.10, no.23, pp.1 - 15

Indexed: SCIE
SCOPUS

Journal Title: ELECTRONICS

Volume: 10

Number: 23

Start Page: 1

End Page: 15

URI: https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/140166

DOI: 10.3390/electronics10232960

Abstract: Most modern processors contain a vector accelerator or internal vector units for the fast computation of large target workloads. However, accelerating applications using vector units is difficult because the underlying data parallelism should be uncovered explicitly using vector-specific instructions. Therefore, vector units are often underutilized or remain idle because of the challenges faced in vector code generation. To solve this underutilization problem of existing vector units, we propose the Vector Offloader for executing scalar programs, which considers the vector unit as a scalar operation unit. By using vector masking, an appropriate partition of the vector unit can be utilized to support scalar instructions. To efficiently utilize all execution units, including the vector unit, the Vector Offloader suggests running the target applications concurrently in both the central processing unit (CPU) and the decoupled vector units, by offloading some parts of the program to the vector unit. Furthermore, a profile-guided optimization technique is employed to determine the optimal offloading ratio for balancing the load between the CPU and the vector unit. We implemented the Vector Offloader on a RISC-V infrastructure with a Hwacha vector unit, and evaluated its performance using a Polybench benchmark set. Experimental results showed that the proposed technique achieved performance improvements up to 1.31× better than the simple, CPU-only execution on a field programmable gate array (FPGA)-level evaluation.

Files in This Item

electronics-10-02960-v2.pdf 941.78 kB

Appears in Collections: 서울 공과대학 > 서울 컴퓨터소프트웨어학부 > 1. Journal Articles

Show full item record

qrcode

Related Researcher

Researcher Park, Yong jun photo

Park, Yong jun: 서울 공과대학 (서울 컴퓨터소프트웨어학부)

Read more

Altmetrics

Total Views & Downloads

STATISTICS: Total View :5,996,966; Today View :24,270

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1365

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE