Accelerating CNN Training With Concurrent Execution of GPU and Processing-in-Memory

Choi, Jungwoo; Lee, Hyuk-Jae; Sohn, Kyomin; Yu, Hak-Soo; Rhee, Chae Eun

doi:10.1109/ACCESS.2024.3488004

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Accelerating CNN Training With Concurrent Execution of GPU and Processing-in-Memory

Full metadata record

DC Field	Value	Language
dc.contributor.author	Choi, Jungwoo	-
dc.contributor.author	Lee, Hyuk-Jae	-
dc.contributor.author	Sohn, Kyomin	-
dc.contributor.author	Yu, Hak-Soo	-
dc.contributor.author	Rhee, Chae Eun	-
dc.date.accessioned	2024-11-28T19:01:05Z	-
dc.date.available	2024-11-28T19:01:05Z	-
dc.date.issued	2024-10	-
dc.identifier.issn	2169-3536	-
dc.identifier.issn	2169-3536	-
dc.identifier.uri	https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/198112	-
dc.description.abstract	Training of convolutional neural networks (CNN) consumes a lot of time and resources. While most previous works have focused on accelerating the convolutional (CONV) layer, the proportion of non-convolutional (non-CONV) layers, such as batch normalization, is gradually increasing during training. Non-CONV layers have low cache reuse and arithmetic intensity, thereby performance is limited by memory bandwidth. Processing-in-memory (PIM) can utilize wide memory bandwidth, making it suitable for acceleration of non-CONV layers. Therefore, it makes sense to perform the computationally complex CONV layer on the host and handle the memory bottleneck challenges of the non-CONV layer on the PIM. Further improved performance can be expected if they run simultaneously. However, memory access conflicts between the host and PIM are the biggest factors hindering performance improvement. Prior studies proposed bank partitioning to alleviate memory conflicts, but it is not effective because CNN training involves significant data sharing between CONV and non-CONV layers. In this paper, we propose a memory scheduling and CNN training flow for the pipelined execution of CONV layers on the host and non-CONV layers on PIM. First, instead of applying bank partitioning, the host and PIM exclusively access memory for a certain period to avoid the movement of shared data between host memory and PIM memory. The conditions for switching the memory access authority between the host and PIM are set per layer, taking into account memory access characteristics and the number of queued memory requests. Second, in the training flow, CONV and non-CONV layers are pipelined in units of output feature map channels. Specifically, for the backward pass, the non-CONV tasks of the feature map gradient calculation phase and the weight gradient update phase are rearranged so that they can be easily performed within CONV layers. Experimental results show that the proposed pipelined execution achieves an average speedup of 18.1% at the network level compared to the serial operation of the host and PIM.	-
dc.format.extent	15	-
dc.language	영어	-
dc.language.iso	ENG	-
dc.publisher	Institute of Electrical and Electronics Engineers Inc.	-
dc.title	Accelerating CNN Training With Concurrent Execution of GPU and Processing-in-Memory	-
dc.type	Article	-
dc.publisher.location	미국	-
dc.identifier.doi	10.1109/ACCESS.2024.3488004	-
dc.identifier.scopusid	2-s2.0-85208388120	-
dc.identifier.wosid	001349761200001	-
dc.identifier.bibliographicCitation	IEEE Access, v.12, pp 160190 - 160204	-
dc.citation.title	IEEE Access	-
dc.citation.volume	12	-
dc.citation.startPage	160190	-
dc.citation.endPage	160204	-
dc.type.docType	Article	-
dc.description.isOpenAccess	Y	-
dc.description.journalRegisteredClass	scie	-
dc.description.journalRegisteredClass	scopus	-
dc.relation.journalResearchArea	Computer Science	-
dc.relation.journalResearchArea	Engineering	-
dc.relation.journalResearchArea	Telecommunications	-
dc.relation.journalWebOfScienceCategory	Computer Science, Information Systems	-
dc.relation.journalWebOfScienceCategory	Engineering, Electrical & Electronic	-
dc.relation.journalWebOfScienceCategory	Telecommunications	-
dc.subject.keywordPlus	Analog storage	-
dc.subject.keywordPlus	Cache memory	-
dc.subject.keywordPlus	Computer graphics equipment	-
dc.subject.keywordPlus	Data Sharing	-
dc.subject.keywordPlus	Gluing	-
dc.subject.keywordPlus	Graphics processing unit	-
dc.subject.keywordPlus	Multilayer neural networks	-
dc.subject.keywordAuthor	Training	-
dc.subject.keywordAuthor	Graphics processing units	-
dc.subject.keywordAuthor	Convolutional neural networks	-
dc.subject.keywordAuthor	Pipelines	-
dc.subject.keywordAuthor	Electric breakdown	-
dc.subject.keywordAuthor	Bandwidth	-
dc.subject.keywordAuthor	Batch normalization	-
dc.subject.keywordAuthor	Switches	-
dc.subject.keywordAuthor	Scheduling algorithms	-
dc.subject.keywordAuthor	Random access memory	-
dc.subject.keywordAuthor	Processing-in-memory	-
dc.subject.keywordAuthor	convolutional neural networks	-
dc.subject.keywordAuthor	neural network training	-
dc.subject.keywordAuthor	GPU	-
dc.identifier.url	https://ieeexplore.ieee.org/document/10738803	-

Files in This Item: Go to Link

Appears in Collections: 서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Eun, Rhee Chae photo

Eun, Rhee Chae: COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE