Cited 0 time in
Lightweight Error Correction for In-Storage Acceleration of Large Language Model Inference
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.author | Jeong, Jinwoo | - |
| dc.contributor.author | Ahn, Byungmin | - |
| dc.contributor.author | Shin, Dongmin | - |
| dc.contributor.author | Choi, Jungwook | - |
| dc.date.accessioned | 2024-11-28T14:31:32Z | - |
| dc.date.available | 2024-11-28T14:31:32Z | - |
| dc.date.issued | 2024-01 | - |
| dc.identifier.issn | 2574-1403 | - |
| dc.identifier.issn | 2767-7699 | - |
| dc.identifier.uri | https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/196965 | - |
| dc.description.abstract | As large language models (LLMs) expand their sizes, conventional GPU-based LLM inference systems face memory bandwidth and capacity limitations. An LLM inference accelerator using NAND flash storage has been proposed to overcome these challenges. However, this necessitates a significant expansion of flash channels to ensure adequate bandwidth for inference, subsequently escalating error correction code (ECC) costs. This paper examines the impact of flash memory errors on LLM inference accuracy and explores the possibility of lightweight ECC by leveraging LLM's inherent error resilience. We analyze the impact of 1) high-order bit indices masking for FP32 LLM parameters, 2) clipping, and 3) a dependency by parameter type of error robustness, and show that a combination of them can reduce ECC bandwidth by up to 9.38%. | - |
| dc.format.extent | 4 | - |
| dc.language | 영어 | - |
| dc.language.iso | ENG | - |
| dc.publisher | Institute of Electrical and Electronics Engineers Inc. | - |
| dc.title | Lightweight Error Correction for In-Storage Acceleration of Large Language Model Inference | - |
| dc.type | Article | - |
| dc.publisher.location | 미국 | - |
| dc.identifier.doi | 10.1109/ICEIC61013.2024.10457117 | - |
| dc.identifier.scopusid | 2-s2.0-85189238917 | - |
| dc.identifier.bibliographicCitation | 2024 International Conference on Electronics, Information, and Communication, ICEIC 2024, pp 1 - 4 | - |
| dc.citation.title | 2024 International Conference on Electronics, Information, and Communication, ICEIC 2024 | - |
| dc.citation.startPage | 1 | - |
| dc.citation.endPage | 4 | - |
| dc.type.docType | Conference paper | - |
| dc.description.isOpenAccess | N | - |
| dc.description.journalRegisteredClass | scopus | - |
| dc.subject.keywordPlus | Error correction codes | - |
| dc.subject.keywordPlus | Errors correction | - |
| dc.subject.keywordPlus | Inference systems | - |
| dc.subject.keywordPlus | Language model | - |
| dc.subject.keywordPlus | Large language model | - |
| dc.subject.keywordPlus | Memory bandwidths | - |
| dc.subject.keywordPlus | Memory capacity | - |
| dc.subject.keywordPlus | Model inference | - |
| dc.subject.keywordPlus | NAND Flash | - |
| dc.subject.keywordPlus | NAND flash error | - |
| dc.subject.keywordAuthor | error correction code | - |
| dc.subject.keywordAuthor | large language model | - |
| dc.subject.keywordAuthor | NAND flash errors | - |
| dc.identifier.url | https://ieeexplore.ieee.org/document/10457117 | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366
COPYRIGHT © 2024 HANYANG UNIVERSITY.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.
