Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

LOP+SAMM: DNN Inference Accelerator with Hardware Loop Offloading and Segment-Wise On-Chip Memory Data Synchronization

Full metadata record
DC Field Value Language
dc.contributor.authorLee, Won Kyoo-
dc.contributor.authorRho, Soomin-
dc.contributor.authorChung, Ki-Seok-
dc.date.accessioned2026-04-21T06:00:13Z-
dc.date.available2026-04-21T06:00:13Z-
dc.date.issued2026-02-
dc.identifier.issn2832-9821-
dc.identifier.issn2832-9848-
dc.identifier.urihttps://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/212286-
dc.description.abstractRecent demand for computing power in Deep Neural Networks (DNNs) has driven extensive research on accelerators that improve performance. Standard accelerators typically rely on hardware tailored to fixed operations and offload operation scheduling to maximize compute utilization, but this approach limits scalability to other models and programmability. Alternatively, systems connect a RISC-V host CPU to control the accelerator and improve flexibility; however, for the multi-loop structure of DNN computation, CPU control overhead can limit compute utilization. These approaches also overlook the gains available from efficient on-chip memory data management.We propose a DNN inference accelerator that mitigates this trade-off by combining a Loop Offloading Processor (LOP) and a Scratchpad–Accumulator Mutex Map (SAMM). LOP offloads the CPU control overheads that arise in nested loops to hardware, thereby addressing utilization limits while preserving programmability through loop-wise control. SAMM operates segment-wise mutual exclusion to orchestrate efficient data transfers between on-chip memory and external memory, enabling fine-grained overlap of transfer and computation and preserving the maximum tile size (i.e., across the entire buffer). Compared to the state-of-the-art Gemmini accelerator, our evaluation demonstrates that LOP+SAMM improves performance by 1.13–1.32× across diverse GEMM (General Matrix Multiplication) workloads, results in up to 1.51× fewer external memory accesses, decreases scratchpad bank conflicts by up to 15.22×, and achieves 1.09–1.18× end-to-end latency speedups at the model level with only a 1.01× area increase over the Gemmini baseline.-
dc.format.extent8-
dc.language영어-
dc.language.isoENG-
dc.publisherInstitute of Electrical and Electronics Engineers-
dc.titleLOP+SAMM: DNN Inference Accelerator with Hardware Loop Offloading and Segment-Wise On-Chip Memory Data Synchronization-
dc.typeArticle-
dc.publisher.location미국-
dc.identifier.doi10.1109/ICECIE66637.2025.11363807-
dc.identifier.scopusid2-s2.0-105033639106-
dc.identifier.bibliographicCitationProceedings, International Conference on Electrical, Control and Instrumentation Engineering, ICECIE, pp 467 - 474-
dc.citation.titleProceedings, International Conference on Electrical, Control and Instrumentation Engineering, ICECIE-
dc.citation.startPage467-
dc.citation.endPage474-
dc.type.docTypeConference paper-
dc.description.isOpenAccessN-
dc.description.journalRegisteredClassscopus-
dc.subject.keywordPlusAcceleration-
dc.subject.keywordPlusComputation offloading-
dc.subject.keywordPlusComputer hardware-
dc.subject.keywordPlusData handling-
dc.subject.keywordPlusData transfer-
dc.subject.keywordPlusDeep neural networks-
dc.subject.keywordPlusInformation management-
dc.subject.keywordPlusMemory architecture-
dc.subject.keywordPlusParticle accelerators-
dc.subject.keywordPlusReduced instruction set computing-
dc.subject.keywordAuthorAccelerator-
dc.subject.keywordAuthorDataflow Processing-
dc.subject.keywordAuthorMatrix Multiplication-
dc.identifier.urlhttps://ieeexplore.ieee.org/document/11363807-
Files in This Item
Go to Link
Appears in
Collections
서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Chung, Ki Seok photo

Chung, Ki Seok
COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)
Read more

Altmetrics

Total Views & Downloads

BROWSE