Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Lightweight Error Correction for In-Storage Acceleration of Large Language Model Inference

Authors
Jeong, JinwooAhn, ByungminShin, DongminChoi, Jungwook
Issue Date
Jan-2024
Publisher
Institute of Electrical and Electronics Engineers Inc.
Keywords
error correction code; large language model; NAND flash errors
Citation
2024 International Conference on Electronics, Information, and Communication, ICEIC 2024, pp 1 - 4
Pages
4
Indexed
SCOPUS
Journal Title
2024 International Conference on Electronics, Information, and Communication, ICEIC 2024
Start Page
1
End Page
4
URI
https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/196965
DOI
10.1109/ICEIC61013.2024.10457117
ISSN
2574-1403
2767-7699
Abstract
As large language models (LLMs) expand their sizes, conventional GPU-based LLM inference systems face memory bandwidth and capacity limitations. An LLM inference accelerator using NAND flash storage has been proposed to overcome these challenges. However, this necessitates a significant expansion of flash channels to ensure adequate bandwidth for inference, subsequently escalating error correction code (ECC) costs. This paper examines the impact of flash memory errors on LLM inference accuracy and explores the possibility of lightweight ECC by leveraging LLM's inherent error resilience. We analyze the impact of 1) high-order bit indices masking for FP32 LLM parameters, 2) clipping, and 3) a dependency by parameter type of error robustness, and show that a combination of them can reduce ECC bandwidth by up to 9.38%.
Files in This Item
Go to Link
Appears in
Collections
서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Choi, Jung wook photo

Choi, Jung wook
COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)
Read more

Altmetrics

Total Views & Downloads

BROWSE