SAM: cross-modal semantic alignments module for image-text retrieval
- Authors
- Park, Pilseo; Jang, Soojin; Cho, Yunsung; Kim, Youngbin
- Issue Date
- Jan-2024
- Publisher
- Springer
- Keywords
- Cross-modal; Graph neural networks; Image-text retrieval; Vision-language
- Citation
- Multimedia Tools and Applications, v.83, no.4, pp 12363 - 12377
- Pages
- 15
- Journal Title
- Multimedia Tools and Applications
- Volume
- 83
- Number
- 4
- Start Page
- 12363
- End Page
- 12377
- URI
- https://scholarworks.bwise.kr/cau/handle/2019.sw.cau/67542
- DOI
- 10.1007/s11042-023-15798-9
- ISSN
- 1380-7501
1573-7721
- Abstract
- Cross-modal image-text retrieval has gained increasing attention due to its ability to combine computer vision with natural language processing. Previously, image and text features were extracted and concatenated to feed the transformer-based retrieval network. However, these approaches implicitly aligned the image and text modalities since the self-attention mechanism computes attention coefficients for all input features. In this paper, we propose cross-modal Semantic Alignments Module (SAM) to establish an explicit alignment through enhancing an inter-modal relationship. Firstly, visual and textual representations were extracted from an image and text pair. Secondly, we constructed a bipartite graph by representing the image regions and words in the sentence as nodes, and the relationship between them as edges. Then our proposed SAM allows the model to compute attention coefficients based on the edges in the graph. This process helps explicitly align the two modalities. Finally, a binary classifier was used to determine whether the given image-text pair is aligned. We reported extensive experiments on MS-COCO and Flickr30K test sets, showing that SAM could capture the joint representation between the two modalities and could be applied to the existing retrieval networks. © 2023, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
- Files in This Item
- There are no files associated with this item.
- Appears in
Collections - Graduate School of Advanced Imaging Sciences, Multimedia and Film > Department of Imaging Science and Arts > 1. Journal Articles
![qrcode](https://api.qrserver.com/v1/create-qr-code/?size=55x55&data=https://scholarworks.bwise.kr/cau/handle/2019.sw.cau/67542)
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.