Integration of Global and Local Representations for Fine-Grained Cross-Modal Alignment

Jin, Seungwan; Choi, Hoyoung; Noh, Taehyung; Han, Kyungsik

doi:10.1007/978-3-031-73010-8_4

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Integration of Global and Local Representations for Fine-Grained Cross-Modal Alignment

Full metadata record

DC Field	Value	Language
dc.contributor.author	Jin, Seungwan	-
dc.contributor.author	Choi, Hoyoung	-
dc.contributor.author	Noh, Taehyung	-
dc.contributor.author	Han, Kyungsik	-
dc.date.accessioned	2024-12-04T05:00:14Z	-
dc.date.available	2024-12-04T05:00:14Z	-
dc.date.issued	2024-11	-
dc.identifier.issn	0302-9743	-
dc.identifier.issn	1611-3349	-
dc.identifier.uri	https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/199806	-
dc.description.abstract	Fashion is one of the representative domains of fine-grained Vision-Language Pre-training (VLP) involving a large number of images and text. Previous fashion VLP research has proposed various pre-training tasks to account for fine-grained details in multimodal fusion. However, fashion VLP research has not yet addressed the need to focus on (1) uni-modal embeddings that reflect fine-grained features and (2) hard negative samples to improve the performance of fine-grained V+L retrieval tasks. In this paper, we propose Fashion-FINE (Fashion VLP with Fine-grained Cross-modal Alignment using the INtegrated representations of global and local patch Embeddings), which consists of three key modules. First, a modality-agnostic adapter (MAA) learns uni-modal integrated representations and reflects fine-grained details contained in local patches. Second, hard negative mining with focal loss (HNM-F) performs cross-modal alignment using the integrated representations, focusing on hard negatives to boost the learning of fine-grained cross-modal alignment. Third, comprehensive cross-modal alignment (C-CmA) extracts low- and high-level fashion information from the text and learns the semantic alignment to encourage disentangled embedding of the integrated image representations. Fashion-FINE achieved state-of-the-art performance on two representative public benchmarks (i.e., FashionGen and FashionIQ) in three representative V+L retrieval tasks, demonstrating its effectiveness in learning fine-grained features.	-
dc.format.extent	18	-
dc.language	영어	-
dc.language.iso	ENG	-
dc.publisher	Springer Verlag	-
dc.title	Integration of Global and Local Representations for Fine-Grained Cross-Modal Alignment	-
dc.type	Article	-
dc.publisher.location	미국	-
dc.identifier.doi	10.1007/978-3-031-73010-8_4	-
dc.identifier.scopusid	2-s2.0-85210322367	-
dc.identifier.wosid	001416938600004	-
dc.identifier.bibliographicCitation	Lecture Notes in Computer Science, v.15141, pp 53 - 70	-
dc.citation.title	Lecture Notes in Computer Science	-
dc.citation.volume	15141	-
dc.citation.startPage	53	-
dc.citation.endPage	70	-
dc.type.docType	Proceedings Paper	-
dc.description.isOpenAccess	N	-
dc.description.journalRegisteredClass	scopus	-
dc.relation.journalResearchArea	Computer Science	-
dc.relation.journalWebOfScienceCategory	Computer Science, Artificial Intelligence	-
dc.relation.journalWebOfScienceCategory	Computer Science, Interdisciplinary Applications	-
dc.relation.journalWebOfScienceCategory	Computer Science, Theory & Methods	-
dc.subject.keywordPlus	Benchmarking	-
dc.subject.keywordPlus	Embeddings	-
dc.subject.keywordPlus	Image coding	-
dc.subject.keywordPlus	Image representation	-
dc.subject.keywordPlus	Visual languages	-
dc.subject.keywordAuthor	Fashion	-
dc.subject.keywordAuthor	Fine-grained Representation Learning	-
dc.subject.keywordAuthor	Vision-Language Pre-training	-

Files in This Item: There are no files associated with this item.

Appears in Collections: 서울 공과대학 > ETC > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Han, Kyungsik photo

Han, Kyungsik: COLLEGE OF ENGINEERING (DEPARTMENT OF INTELLIGENCE COMPUTING)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE