LOP+SAMM: DNN Inference Accelerator with Hardware Loop Offloading and Segment-Wise On-Chip Memory Data Synchronization

Lee, Won Kyoo; Rho, Soomin; Chung, Ki-Seok

doi:10.1109/ICECIE66637.2025.11363807

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

LOP+SAMM: DNN Inference Accelerator with Hardware Loop Offloading and Segment-Wise On-Chip Memory Data Synchronization

Full metadata record

DC Field	Value	Language
dc.contributor.author	Lee, Won Kyoo	-
dc.contributor.author	Rho, Soomin	-
dc.contributor.author	Chung, Ki-Seok	-
dc.date.accessioned	2026-04-21T06:00:13Z	-
dc.date.available	2026-04-21T06:00:13Z	-
dc.date.issued	2026-02	-
dc.identifier.issn	2832-9821	-
dc.identifier.issn	2832-9848	-
dc.identifier.uri	https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/212286	-
dc.description.abstract	Recent demand for computing power in Deep Neural Networks (DNNs) has driven extensive research on accelerators that improve performance. Standard accelerators typically rely on hardware tailored to fixed operations and offload operation scheduling to maximize compute utilization, but this approach limits scalability to other models and programmability. Alternatively, systems connect a RISC-V host CPU to control the accelerator and improve flexibility; however, for the multi-loop structure of DNN computation, CPU control overhead can limit compute utilization. These approaches also overlook the gains available from efficient on-chip memory data management.We propose a DNN inference accelerator that mitigates this trade-off by combining a Loop Offloading Processor (LOP) and a Scratchpad–Accumulator Mutex Map (SAMM). LOP offloads the CPU control overheads that arise in nested loops to hardware, thereby addressing utilization limits while preserving programmability through loop-wise control. SAMM operates segment-wise mutual exclusion to orchestrate efficient data transfers between on-chip memory and external memory, enabling fine-grained overlap of transfer and computation and preserving the maximum tile size (i.e., across the entire buffer). Compared to the state-of-the-art Gemmini accelerator, our evaluation demonstrates that LOP+SAMM improves performance by 1.13–1.32× across diverse GEMM (General Matrix Multiplication) workloads, results in up to 1.51× fewer external memory accesses, decreases scratchpad bank conflicts by up to 15.22×, and achieves 1.09–1.18× end-to-end latency speedups at the model level with only a 1.01× area increase over the Gemmini baseline.	-
dc.format.extent	8	-
dc.language	영어	-
dc.language.iso	ENG	-
dc.publisher	Institute of Electrical and Electronics Engineers	-
dc.title	LOP+SAMM: DNN Inference Accelerator with Hardware Loop Offloading and Segment-Wise On-Chip Memory Data Synchronization	-
dc.type	Article	-
dc.publisher.location	미국	-
dc.identifier.doi	10.1109/ICECIE66637.2025.11363807	-
dc.identifier.scopusid	2-s2.0-105033639106	-
dc.identifier.bibliographicCitation	Proceedings, International Conference on Electrical, Control and Instrumentation Engineering, ICECIE, pp 467 - 474	-
dc.citation.title	Proceedings, International Conference on Electrical, Control and Instrumentation Engineering, ICECIE	-
dc.citation.startPage	467	-
dc.citation.endPage	474	-
dc.type.docType	Conference paper	-
dc.description.isOpenAccess	N	-
dc.description.journalRegisteredClass	scopus	-
dc.subject.keywordPlus	Acceleration	-
dc.subject.keywordPlus	Computation offloading	-
dc.subject.keywordPlus	Computer hardware	-
dc.subject.keywordPlus	Data handling	-
dc.subject.keywordPlus	Data transfer	-
dc.subject.keywordPlus	Deep neural networks	-
dc.subject.keywordPlus	Information management	-
dc.subject.keywordPlus	Memory architecture	-
dc.subject.keywordPlus	Particle accelerators	-
dc.subject.keywordPlus	Reduced instruction set computing	-
dc.subject.keywordAuthor	Accelerator	-
dc.subject.keywordAuthor	Dataflow Processing	-
dc.subject.keywordAuthor	Matrix Multiplication	-
dc.identifier.url	https://ieeexplore.ieee.org/document/11363807	-

Files in This Item: Go to Link

Appears in Collections: 서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Chung, Ki Seok photo

Chung, Ki Seok: COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE