Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

CONNA: Configurable Matrix Multiplication Engine for Neural Network Accelerationopen access

Authors
Park, Sang-SooChung, Ki-Seok
Issue Date
Aug-2022
Publisher
MDPI
Keywords
convolutional neural network (CNN); neural processing unit (NPU); matrix multiplication; various shapes and dimensions
Citation
ELECTRONICS, v.11, no.15, pp.1 - 23
Indexed
SCIE
SCOPUS
Journal Title
ELECTRONICS
Volume
11
Number
15
Start Page
1
End Page
23
URI
https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/171545
DOI
10.3390/electronics11152373
ISSN
2079-9292
Abstract
Convolutional neural networks (CNNs) have demonstrated promising results in various applications such as computer vision, speech recognition, and natural language processing. One of the key computations in many CNN applications is matrix multiplication, which accounts for a significant portion of computation. Therefore, hardware accelerators to effectively speed up the computation of matrix multiplication have been proposed, and several studies have attempted to design hardware accelerators to perform better matrix multiplications in terms of both speed and power consumption. Typically, accelerators with either a two-dimensional (2D) systolic array structure or a single instruction multiple data (SIMD) architecture are effective only when the input matrix has shapes that are close to or similar to a square. However, several CNN applications require multiplications of non-squared matrices with various shapes and dimensions, and such irregular shapes lead to poor utilization efficiency of the processing elements (PEs). This study proposes a configurable engine for neural network acceleration, called CONNA, whose computation engine can conduct matrix multiplications with highly utilized computing units, regardless of the access patterns, shapes, and dimensions of the input matrices by changing the shape of matrix multiplication conducted in the physical array. To verify the functionality of the CONNA accelerator, we implemented CONNA as an SoC platform that integrates a RISC-V MCU with CONNA on Xilinx VC707 FPGA. SqueezeNet on CONNA achieved an inference performance of 100 frames per second (FPS) with 2.36 mm(2) and 83.55 mW in a 65 nm process, improving efficiency by up to 34.1 times better than existing accelerators in terms of FPS, silicon area, and power consumption.
Files in This Item
Appears in
Collections
서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Chung, Ki Seok photo

Chung, Ki Seok
COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)
Read more

Altmetrics

Total Views & Downloads

BROWSE