Improving Conversational Abilities of Quantized Large Language Models via Direct Preference Alignment

Lee, Janghwan; Park, Seongmin; Hong, Sukjin; Kim, Minsoo; Chang, Du-Seong; Choi, Jungwook

doi:10.18653/v1/2024.acl-long.612

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Improving Conversational Abilities of Quantized Large Language Models via Direct Preference Alignment

Full metadata record

DC Field	Value	Language
dc.contributor.author	Lee, Janghwan	-
dc.contributor.author	Park, Seongmin	-
dc.contributor.author	Hong, Sukjin	-
dc.contributor.author	Kim, Minsoo	-
dc.contributor.author	Chang, Du-Seong	-
dc.contributor.author	Choi, Jungwook	-
dc.date.accessioned	2024-11-28T08:36:16Z	-
dc.date.available	2024-11-28T08:36:16Z	-
dc.date.issued	2024-08	-
dc.identifier.issn	0736-587X	-
dc.identifier.uri	https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/195396	-
dc.description.abstract	The rapid advancement of large language models (LLMs) has facilitated their transformation into conversational chatbots that can grasp contextual nuances and generate pertinent sentences, closely mirroring human values through advanced techniques such as instruction tuning and reinforcement learning from human feedback (RLHF). However, the computational efficiency required for LLMs, achieved through techniques like post-training quantization (PTQ), presents challenges such as token-flipping that can impair chatbot performance. In response, we propose a novel preference alignment approach, quantization-aware direct preference optimization (QDPO), that aligns quantized LLMs with their full-precision counterparts, improving conversational abilities. Evaluated on two instruction-tuned LLMs in various languages, QDPO demonstrated superior performance in improving conversational abilities compared to established PTQ and knowledge-distillation fine-tuning techniques, marking a significant step forward in the development of efficient and effective conversational LLMs.	-
dc.format.extent	19	-
dc.language	영어	-
dc.language.iso	ENG	-
dc.title	Improving Conversational Abilities of Quantized Large Language Models via Direct Preference Alignment	-
dc.type	Article	-
dc.publisher.location	영국	-
dc.identifier.doi	10.18653/v1/2024.acl-long.612	-
dc.identifier.scopusid	2-s2.0-85204454910	-
dc.identifier.bibliographicCitation	Association for Computational Linguistics (ACL). Annual Meeting Conference Proceedings, v.1, pp 11346 - 11364	-
dc.citation.title	Association for Computational Linguistics (ACL). Annual Meeting Conference Proceedings	-
dc.citation.volume	1	-
dc.citation.startPage	11346	-
dc.citation.endPage	11364	-
dc.type.docType	Conference paper	-
dc.description.isOpenAccess	N	-
dc.description.journalRegisteredClass	scopus	-
dc.subject.keywordPlus	Adversarial machine learning	-
dc.subject.keywordPlus	Contrastive Learning	-
dc.subject.keywordPlus	Reinforcement learning	-
dc.identifier.url	https://aclanthology.org/2024.acl-long.612/	-

Files in This Item: Go to Link

Appears in Collections: 서울 공과대학 > 서울 융합전자공학부 > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Choi, Jung wook photo

Choi, Jung wook: COLLEGE OF ENGINEERING (SCHOOL OF ELECTRONIC ENGINEERING)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE