Cited 0 time in
A Low Power Attention and Softmax Accelerator for Large Language Models Inference
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.author | Kim, Jeong-Hyun | - |
| dc.contributor.author | Kim, Chan-Hoon | - |
| dc.contributor.author | Rho, Soo-Min | - |
| dc.contributor.author | Chung, Ki-Seok | - |
| dc.date.accessioned | 2025-02-12T08:00:32Z | - |
| dc.date.available | 2025-02-12T08:00:32Z | - |
| dc.date.issued | 2024-12 | - |
| dc.identifier.uri | https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/206470 | - |
| dc.description.abstract | Transformer-based models, essential for high-performing Large Language Models (LLMs), surpass traditional Deep Neural Networks but require substantial computational resources. Therefore, more efficient transformer algorithms and accelerators are required to reduce the computational cost and power consumption of LLMs. We observed that as the sequence length increases, softmax operations, which are the key operation of the transformer self-attention mechanism, become the major bottleneck. In this paper, we propose Cross-Road Softmax, an optimized algorithm designed for the softmax operation within the attention layer, specifically tailored for inference in LLMs. Our software experiment was conducted on 8 Natural Language Processing benchmarks for evaluation. Furthermore, we design a Cross-Road Accel using the proposed Cross-Road Softmax that accelerates softmax function of the self-attention layer. We implement Cross-Road Accel in RTL and synthesize it with Syn-opsys Design Compiler using Nangate 15nm open cell library to obtain power and area statistics. In summary, on average, Cross-Road Accel achieves an approximately 3.5 × increase in energy efficiency compared to state-of-the-art transformer accelerators. | - |
| dc.format.extent | 4 | - |
| dc.language | 영어 | - |
| dc.language.iso | ENG | - |
| dc.publisher | Institute of Electrical and Electronics Engineers Inc. | - |
| dc.title | A Low Power Attention and Softmax Accelerator for Large Language Models Inference | - |
| dc.type | Article | - |
| dc.identifier.doi | 10.1109/ICCE-Asia63397.2024.10773935 | - |
| dc.identifier.scopusid | 2-s2.0-85214868470 | - |
| dc.identifier.bibliographicCitation | 2024 IEEE International Conference on Consumer Electronics-Asia, ICCE-Asia 2024, pp 1 - 4 | - |
| dc.citation.title | 2024 IEEE International Conference on Consumer Electronics-Asia, ICCE-Asia 2024 | - |
| dc.citation.startPage | 1 | - |
| dc.citation.endPage | 4 | - |
| dc.type.docType | Conference paper | - |
| dc.description.isOpenAccess | N | - |
| dc.description.journalRegisteredClass | scopus | - |
| dc.subject.keywordPlus | Benchmarking | - |
| dc.subject.keywordPlus | Integrated circuit design | - |
| dc.subject.keywordPlus | Multilayer neural networks | - |
| dc.subject.keywordPlus | Printed circuit design | - |
| dc.subject.keywordPlus | Problem oriented languages | - |
| dc.subject.keywordPlus | Program compilers | - |
| dc.subject.keywordPlus | Structural dynamics | - |
| dc.subject.keywordAuthor | AI accelerator | - |
| dc.subject.keywordAuthor | Algorithm-Hardware Co-Design | - |
| dc.subject.keywordAuthor | LLMs | - |
| dc.subject.keywordAuthor | Low Power Design | - |
| dc.subject.keywordAuthor | NLP | - |
| dc.subject.keywordAuthor | Softmax | - |
| dc.subject.keywordAuthor | Transformer | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366
COPYRIGHT © 2024 HANYANG UNIVERSITY.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.
