A Low Power Attention and Softmax Accelerator for Large Language Models Inference

Kim, Jeong-Hyun; Kim, Chan-Hoon; Rho, Soo-Min; Chung, Ki-Seok

doi:10.1109/ICCE-Asia63397.2024.10773935

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

A Low Power Attention and Softmax Accelerator for Large Language Models Inference

Full metadata record

DC Field	Value	Language
dc.contributor.author	Kim, Jeong-Hyun	-
dc.contributor.author	Kim, Chan-Hoon	-
dc.contributor.author	Rho, Soo-Min	-
dc.contributor.author	Chung, Ki-Seok	-
dc.date.accessioned	2025-02-12T08:00:32Z	-
dc.date.available	2025-02-12T08:00:32Z	-
dc.date.issued	2024-12	-
dc.identifier.uri	https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/206470	-
dc.description.abstract	Transformer-based models, essential for high-performing Large Language Models (LLMs), surpass traditional Deep Neural Networks but require substantial computational resources. Therefore, more efficient transformer algorithms and accelerators are required to reduce the computational cost and power consumption of LLMs. We observed that as the sequence length increases, softmax operations, which are the key operation of the transformer self-attention mechanism, become the major bottleneck. In this paper, we propose Cross-Road Softmax, an optimized algorithm designed for the softmax operation within the attention layer, specifically tailored for inference in LLMs. Our software experiment was conducted on 8 Natural Language Processing benchmarks for evaluation. Furthermore, we design a Cross-Road Accel using the proposed Cross-Road Softmax that accelerates softmax function of the self-attention layer. We implement Cross-Road Accel in RTL and synthesize it with Syn-opsys Design Compiler using Nangate 15nm open cell library to obtain power and area statistics. In summary, on average, Cross-Road Accel achieves an approximately 3.5 × increase in energy efficiency compared to state-of-the-art transformer accelerators.	-
dc.format.extent	4	-
dc.language	영어	-
dc.language.iso	ENG	-
dc.publisher	Institute of Electrical and Electronics Engineers Inc.	-
dc.title	A Low Power Attention and Softmax Accelerator for Large Language Models Inference	-
dc.type	Article	-
dc.identifier.doi	10.1109/ICCE-Asia63397.2024.10773935	-
dc.identifier.scopusid	2-s2.0-85214868470	-
dc.identifier.bibliographicCitation	2024 IEEE International Conference on Consumer Electronics-Asia, ICCE-Asia 2024, pp 1 - 4	-
dc.citation.title	2024 IEEE International Conference on Consumer Electronics-Asia, ICCE-Asia 2024	-
dc.citation.startPage	1	-
dc.citation.endPage	4	-
dc.type.docType	Conference paper	-
dc.description.isOpenAccess	N	-
dc.description.journalRegisteredClass	scopus	-
dc.subject.keywordPlus	Benchmarking	-
dc.subject.keywordPlus	Integrated circuit design	-
dc.subject.keywordPlus	Multilayer neural networks	-
dc.subject.keywordPlus	Printed circuit design	-
dc.subject.keywordPlus	Problem oriented languages	-
dc.subject.keywordPlus	Program compilers	-
dc.subject.keywordPlus	Structural dynamics	-
dc.subject.keywordAuthor	AI accelerator	-
dc.subject.keywordAuthor	Algorithm-Hardware Co-Design	-
dc.subject.keywordAuthor	LLMs	-
dc.subject.keywordAuthor	Low Power Design	-
dc.subject.keywordAuthor	NLP	-
dc.subject.keywordAuthor	Softmax	-
dc.subject.keywordAuthor	Transformer	-

Files in This Item: There are no files associated with this item.

Appears in Collections: 서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Chung, Ki Seok photo

Chung, Ki Seok: COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE