Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Searching Optimal Floating-Point Format for Sub-8-Bit Large Language Model Inference

Full metadata record
DC Field Value Language
dc.contributor.authorHwang, Youngdeok-
dc.contributor.authorLee, Janghwan-
dc.contributor.authorPark, Jiwoong-
dc.contributor.authorLim, Jieun-
dc.contributor.authorChoi, Jungwook-
dc.date.accessioned2024-11-28T14:31:32Z-
dc.date.available2024-11-28T14:31:32Z-
dc.date.issued2024-01-
dc.identifier.issn2574-1403-
dc.identifier.issn2767-7699-
dc.identifier.urihttps://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/196964-
dc.description.abstractLarge Language Models (LLMs) have shown remarkable success in various natural language processing tasks. However, their extensive parameter count leads to significant memory and computational demands. To tackle these challenges, there is growing interest in employing post-training quantization (PTQ) with reduced-precision floating-point (FP) operations. Yet, the optimal FP configuration remains a topic of debate. Existing studies often overlook a thorough analysis of the diverse data distributions found in LLMs and the crucial design choice, denormal. In this paper, we conduct a comprehensive examination of the various data distributions within LLMs and the significance of denormal representation, presenting a mixed-format floating-point framework. Our proposed framework allows for sub-8-bit inference with minimal performance degradation in language modeling and reasoning tasks across a broad spectrum of LLMs.-
dc.format.extent4-
dc.language영어-
dc.language.isoENG-
dc.publisherInstitute of Electrical and Electronics Engineers Inc.-
dc.titleSearching Optimal Floating-Point Format for Sub-8-Bit Large Language Model Inference-
dc.typeArticle-
dc.publisher.location미국-
dc.identifier.doi10.1109/ICEIC61013.2024.10457111-
dc.identifier.scopusid2-s2.0-85189243662-
dc.identifier.bibliographicCitation2024 International Conference on Electronics, Information, and Communication, ICEIC 2024, pp 1 - 4-
dc.citation.title2024 International Conference on Electronics, Information, and Communication, ICEIC 2024-
dc.citation.startPage1-
dc.citation.endPage4-
dc.type.docTypeConference paper-
dc.description.isOpenAccessN-
dc.description.journalRegisteredClassscopus-
dc.subject.keywordAuthorfloating-point-
dc.subject.keywordAuthorLarge language model-
dc.subject.keywordAuthormixed-format-
dc.subject.keywordAuthorpost-training quantization-
dc.identifier.urlhttps://ieeexplore.ieee.org/document/10457111-
Files in This Item
Go to Link
Appears in
Collections
서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Choi, Jung wook photo

Choi, Jung wook
COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)
Read more

Altmetrics

Total Views & Downloads

BROWSE