메타학습을 이용한 시각-언어 모델 프롬프트 튜닝

김도현; 백성용

doi:10.5909/JBE.2025.30.4.571

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

메타학습을 이용한 시각-언어 모델 프롬프트 튜닝

Full metadata record

DC Field	Value	Language
dc.contributor.author	김도현	-
dc.contributor.author	백성용	-
dc.date.accessioned	2025-08-12T07:00:11Z	-
dc.date.available	2025-08-12T07:00:11Z	-
dc.date.issued	2025-07	-
dc.identifier.issn	1226-7953	-
dc.identifier.issn	2287-9137	-
dc.identifier.uri	https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/208498	-
dc.description.abstract	최근, CLIP과 같은 대규모 사전학습된 시각-언어 모델(Vision-Language Model)을 다양한 다운스트림 태스크에 적용한 연구들이 우수한 성능을 보이고 있다. 특히 소수의 샘플만을 활용하는 이미지 분류(Low-shot Image Classification)에서는, 연속적인 프롬프트 벡터를 최적화하는 방식이 주목받고 있으나, 일반화 성능이 낮다는 한계가 있다. 최근 연구들은 해당 문제를 해결하기 위해서 추가적인모델 구조나 알고리즘을 도입하지만, 이는 효율성이 저하되는 단점을 지닌다. 본 논문에서는 일반화 성능을 효율적으로 높이기 위해,멀티모달(Multi-modal) 표현을 활용한 메타러닝(Meta-Learing) 알고리즘으로 프롬프트 벡터를 최적화시키는 방법을 제안한다. 제안하는 방법은 학습된 도메인과 신규 도메인 모두 고려한 성능에서 다른 모델들에 비해 약 9.6% 이상의 정확도 향상을 보이며, 추가적인메모리나 지연시간 없이 제로샷 추론이 가능하다.	-
dc.description.abstract	Recently, applying large pre-trained vision-language models such as CLIP to various downstream tasks has shown good performance. In low-shot image classification, simply optimizing continuous prompt vectors emerged, but has the limitation of low generalizability. Recent research introduces additional structures and algorithms to solve this problem, but there is a disadvantage of inefficiency. So, we propose a meta-training framework utilizing multi-modal features to optimize the prompt vectors. Our proposed method achieves over a 9.6% accuracy improvement compared to other models when evaluated on both seen and unseen domains comprehensively, which has no any additional memory overhead or inference latency for zero-shot inference.	-
dc.format.extent	9	-
dc.language	한국어	-
dc.language.iso	KOR	-
dc.publisher	한국방송∙미디어공학회	-
dc.title	메타학습을 이용한 시각-언어 모델 프롬프트 튜닝	-
dc.title.alternative	Prompt Tuning for Vision-Language Models via Meta-Training	-
dc.type	Article	-
dc.publisher.location	대한민국	-
dc.identifier.doi	10.5909/JBE.2025.30.4.571	-
dc.identifier.bibliographicCitation	방송공학회 논문지, v.30, no.4, pp 571 - 579	-
dc.citation.title	방송공학회 논문지	-
dc.citation.volume	30	-
dc.citation.number	4	-
dc.citation.startPage	571	-
dc.citation.endPage	579	-
dc.type.docType	Y	-
dc.identifier.kciid	ART003228382	-
dc.description.isOpenAccess	N	-
dc.description.journalRegisteredClass	kci	-
dc.subject.keywordAuthor	Vision-Language Models	-
dc.subject.keywordAuthor	Prompt Tuning	-
dc.subject.keywordAuthor	Meta-Learning	-
dc.subject.keywordAuthor	Low-shot Image Classification	-
dc.identifier.url	https://ksbe-jbe.org/_common/do.php?a=full&b=13&bidx=4152&aidx=45822	-

Files in This Item: Go to Link

Appears in Collections: 서울 공과대학 > ETC > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Baik, Sungyong photo

Baik, Sungyong: COLLEGE OF ENGINEERING (DEPARTMENT OF INTELLIGENCE COMPUTING)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE