Cited 0 time in
LOP+SAMM: DNN Inference Accelerator with Hardware Loop Offloading and Segment-Wise On-Chip Memory Data Synchronization
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.author | Lee, Won Kyoo | - |
| dc.contributor.author | Rho, Soomin | - |
| dc.contributor.author | Chung, Ki-Seok | - |
| dc.date.accessioned | 2026-04-21T06:00:13Z | - |
| dc.date.available | 2026-04-21T06:00:13Z | - |
| dc.date.issued | 2026-02 | - |
| dc.identifier.issn | 2832-9821 | - |
| dc.identifier.issn | 2832-9848 | - |
| dc.identifier.uri | https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/212286 | - |
| dc.description.abstract | Recent demand for computing power in Deep Neural Networks (DNNs) has driven extensive research on accelerators that improve performance. Standard accelerators typically rely on hardware tailored to fixed operations and offload operation scheduling to maximize compute utilization, but this approach limits scalability to other models and programmability. Alternatively, systems connect a RISC-V host CPU to control the accelerator and improve flexibility; however, for the multi-loop structure of DNN computation, CPU control overhead can limit compute utilization. These approaches also overlook the gains available from efficient on-chip memory data management.We propose a DNN inference accelerator that mitigates this trade-off by combining a Loop Offloading Processor (LOP) and a Scratchpad–Accumulator Mutex Map (SAMM). LOP offloads the CPU control overheads that arise in nested loops to hardware, thereby addressing utilization limits while preserving programmability through loop-wise control. SAMM operates segment-wise mutual exclusion to orchestrate efficient data transfers between on-chip memory and external memory, enabling fine-grained overlap of transfer and computation and preserving the maximum tile size (i.e., across the entire buffer). Compared to the state-of-the-art Gemmini accelerator, our evaluation demonstrates that LOP+SAMM improves performance by 1.13–1.32× across diverse GEMM (General Matrix Multiplication) workloads, results in up to 1.51× fewer external memory accesses, decreases scratchpad bank conflicts by up to 15.22×, and achieves 1.09–1.18× end-to-end latency speedups at the model level with only a 1.01× area increase over the Gemmini baseline. | - |
| dc.format.extent | 8 | - |
| dc.language | 영어 | - |
| dc.language.iso | ENG | - |
| dc.publisher | Institute of Electrical and Electronics Engineers | - |
| dc.title | LOP+SAMM: DNN Inference Accelerator with Hardware Loop Offloading and Segment-Wise On-Chip Memory Data Synchronization | - |
| dc.type | Article | - |
| dc.publisher.location | 미국 | - |
| dc.identifier.doi | 10.1109/ICECIE66637.2025.11363807 | - |
| dc.identifier.scopusid | 2-s2.0-105033639106 | - |
| dc.identifier.bibliographicCitation | Proceedings, International Conference on Electrical, Control and Instrumentation Engineering, ICECIE, pp 467 - 474 | - |
| dc.citation.title | Proceedings, International Conference on Electrical, Control and Instrumentation Engineering, ICECIE | - |
| dc.citation.startPage | 467 | - |
| dc.citation.endPage | 474 | - |
| dc.type.docType | Conference paper | - |
| dc.description.isOpenAccess | N | - |
| dc.description.journalRegisteredClass | scopus | - |
| dc.subject.keywordPlus | Acceleration | - |
| dc.subject.keywordPlus | Computation offloading | - |
| dc.subject.keywordPlus | Computer hardware | - |
| dc.subject.keywordPlus | Data handling | - |
| dc.subject.keywordPlus | Data transfer | - |
| dc.subject.keywordPlus | Deep neural networks | - |
| dc.subject.keywordPlus | Information management | - |
| dc.subject.keywordPlus | Memory architecture | - |
| dc.subject.keywordPlus | Particle accelerators | - |
| dc.subject.keywordPlus | Reduced instruction set computing | - |
| dc.subject.keywordAuthor | Accelerator | - |
| dc.subject.keywordAuthor | Dataflow Processing | - |
| dc.subject.keywordAuthor | Matrix Multiplication | - |
| dc.identifier.url | https://ieeexplore.ieee.org/document/11363807 | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366
COPYRIGHT © 2024 HANYANG UNIVERSITY.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.
