Cited 0 time in
Integration of Global and Local Representations for Fine-Grained Cross-Modal Alignment
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.author | Jin, Seungwan | - |
| dc.contributor.author | Choi, Hoyoung | - |
| dc.contributor.author | Noh, Taehyung | - |
| dc.contributor.author | Han, Kyungsik | - |
| dc.date.accessioned | 2024-12-04T05:00:14Z | - |
| dc.date.available | 2024-12-04T05:00:14Z | - |
| dc.date.issued | 2024-11 | - |
| dc.identifier.issn | 0302-9743 | - |
| dc.identifier.issn | 1611-3349 | - |
| dc.identifier.uri | https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/199806 | - |
| dc.description.abstract | Fashion is one of the representative domains of fine-grained Vision-Language Pre-training (VLP) involving a large number of images and text. Previous fashion VLP research has proposed various pre-training tasks to account for fine-grained details in multimodal fusion. However, fashion VLP research has not yet addressed the need to focus on (1) uni-modal embeddings that reflect fine-grained features and (2) hard negative samples to improve the performance of fine-grained V+L retrieval tasks. In this paper, we propose Fashion-FINE (Fashion VLP with Fine-grained Cross-modal Alignment using the INtegrated representations of global and local patch Embeddings), which consists of three key modules. First, a modality-agnostic adapter (MAA) learns uni-modal integrated representations and reflects fine-grained details contained in local patches. Second, hard negative mining with focal loss (HNM-F) performs cross-modal alignment using the integrated representations, focusing on hard negatives to boost the learning of fine-grained cross-modal alignment. Third, comprehensive cross-modal alignment (C-CmA) extracts low- and high-level fashion information from the text and learns the semantic alignment to encourage disentangled embedding of the integrated image representations. Fashion-FINE achieved state-of-the-art performance on two representative public benchmarks (i.e., FashionGen and FashionIQ) in three representative V+L retrieval tasks, demonstrating its effectiveness in learning fine-grained features. | - |
| dc.format.extent | 18 | - |
| dc.language | 영어 | - |
| dc.language.iso | ENG | - |
| dc.publisher | Springer Verlag | - |
| dc.title | Integration of Global and Local Representations for Fine-Grained Cross-Modal Alignment | - |
| dc.type | Article | - |
| dc.publisher.location | 미국 | - |
| dc.identifier.doi | 10.1007/978-3-031-73010-8_4 | - |
| dc.identifier.scopusid | 2-s2.0-85210322367 | - |
| dc.identifier.wosid | 001416938600004 | - |
| dc.identifier.bibliographicCitation | Lecture Notes in Computer Science, v.15141, pp 53 - 70 | - |
| dc.citation.title | Lecture Notes in Computer Science | - |
| dc.citation.volume | 15141 | - |
| dc.citation.startPage | 53 | - |
| dc.citation.endPage | 70 | - |
| dc.type.docType | Proceedings Paper | - |
| dc.description.isOpenAccess | N | - |
| dc.description.journalRegisteredClass | scopus | - |
| dc.relation.journalResearchArea | Computer Science | - |
| dc.relation.journalWebOfScienceCategory | Computer Science, Artificial Intelligence | - |
| dc.relation.journalWebOfScienceCategory | Computer Science, Interdisciplinary Applications | - |
| dc.relation.journalWebOfScienceCategory | Computer Science, Theory & Methods | - |
| dc.subject.keywordPlus | Benchmarking | - |
| dc.subject.keywordPlus | Embeddings | - |
| dc.subject.keywordPlus | Image coding | - |
| dc.subject.keywordPlus | Image representation | - |
| dc.subject.keywordPlus | Visual languages | - |
| dc.subject.keywordAuthor | Fashion | - |
| dc.subject.keywordAuthor | Fine-grained Representation Learning | - |
| dc.subject.keywordAuthor | Vision-Language Pre-training | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366
COPYRIGHT © 2024 HANYANG UNIVERSITY.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.
