CONNA: Configurable Matrix Multiplication Engine for Neural Network Acceleration

Park, Sang-Soo; Chung, Ki-Seok

doi:10.3390/electronics11152373

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

CONNA: Configurable Matrix Multiplication Engine for Neural Network Accelerationopen access

Authors: Park, Sang-Soo; Chung, Ki-Seok

Issue Date: Aug-2022

Publisher: MDPI

Keywords: convolutional neural network (CNN); neural processing unit (NPU); matrix multiplication; various shapes and dimensions

Citation: ELECTRONICS, v.11, no.15, pp.1 - 23

Indexed: SCIE
SCOPUS

Journal Title: ELECTRONICS

Volume: 11

Number: 15

Start Page: 1

End Page: 23

URI: https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/171545

DOI: 10.3390/electronics11152373

ISSN: 2079-9292

Abstract: Convolutional neural networks (CNNs) have demonstrated promising results in various applications such as computer vision, speech recognition, and natural language processing. One of the key computations in many CNN applications is matrix multiplication, which accounts for a significant portion of computation. Therefore, hardware accelerators to effectively speed up the computation of matrix multiplication have been proposed, and several studies have attempted to design hardware accelerators to perform better matrix multiplications in terms of both speed and power consumption. Typically, accelerators with either a two-dimensional (2D) systolic array structure or a single instruction multiple data (SIMD) architecture are effective only when the input matrix has shapes that are close to or similar to a square. However, several CNN applications require multiplications of non-squared matrices with various shapes and dimensions, and such irregular shapes lead to poor utilization efficiency of the processing elements (PEs). This study proposes a configurable engine for neural network acceleration, called CONNA, whose computation engine can conduct matrix multiplications with highly utilized computing units, regardless of the access patterns, shapes, and dimensions of the input matrices by changing the shape of matrix multiplication conducted in the physical array. To verify the functionality of the CONNA accelerator, we implemented CONNA as an SoC platform that integrates a RISC-V MCU with CONNA on Xilinx VC707 FPGA. SqueezeNet on CONNA achieved an inference performance of 100 frames per second (FPS) with 2.36 mm(2) and 83.55 mW in a 65 nm process, improving efficiency by up to 34.1 times better than existing accelerators in terms of FPS, silicon area, and power consumption.

Files in This Item

electronics-11-02373.pdf 6.56 MB

Appears in Collections: 서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

Show full item record

qrcode

Related Researcher

Researcher Chung, Ki Seok photo

Chung, Ki Seok: COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1365

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE