Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

SAM: cross-modal semantic alignments module for image-text retrieval

Authors
Park, PilseoJang, SoojinCho, YunsungKim, Youngbin
Issue Date
Jan-2024
Publisher
Springer
Keywords
Cross-modal; Graph neural networks; Image-text retrieval; Vision-language
Citation
Multimedia Tools and Applications, v.83, no.4, pp 12363 - 12377
Pages
15
Journal Title
Multimedia Tools and Applications
Volume
83
Number
4
Start Page
12363
End Page
12377
URI
https://scholarworks.bwise.kr/cau/handle/2019.sw.cau/67542
DOI
10.1007/s11042-023-15798-9
ISSN
1380-7501
1573-7721
Abstract
Cross-modal image-text retrieval has gained increasing attention due to its ability to combine computer vision with natural language processing. Previously, image and text features were extracted and concatenated to feed the transformer-based retrieval network. However, these approaches implicitly aligned the image and text modalities since the self-attention mechanism computes attention coefficients for all input features. In this paper, we propose cross-modal Semantic Alignments Module (SAM) to establish an explicit alignment through enhancing an inter-modal relationship. Firstly, visual and textual representations were extracted from an image and text pair. Secondly, we constructed a bipartite graph by representing the image regions and words in the sentence as nodes, and the relationship between them as edges. Then our proposed SAM allows the model to compute attention coefficients based on the edges in the graph. This process helps explicitly align the two modalities. Finally, a binary classifier was used to determine whether the given image-text pair is aligned. We reported extensive experiments on MS-COCO and Flickr30K test sets, showing that SAM could capture the joint representation between the two modalities and could be applied to the existing retrieval networks. © 2023, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
Files in This Item
There are no files associated with this item.
Appears in
Collections
Graduate School of Advanced Imaging Sciences, Multimedia and Film > Department of Imaging Science and Arts > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Kim, Young Bin photo

Kim, Young Bin
첨단영상대학원 (영상학과)
Read more

Altmetrics

Total Views & Downloads

BROWSE