Multimedia analysis of robustly optimized multimodal transformer based on vision and language co-learning

Yoon, JunHo; Choi, GyuHo; Choi, Chang

Detailed Information

Cited 0 time in webofscience

Cited 1 time in scopus

Metadata Downloads

Multimedia analysis of robustly optimized multimodal transformer based on vision and language co-learning

Full metadata record

DC Field	Value	Language
dc.contributor.author	Yoon, JunHo	-
dc.contributor.author	Choi, GyuHo	-
dc.contributor.author	Choi, Chang	-
dc.date.accessioned	2023-09-21T02:40:35Z	-
dc.date.available	2023-09-21T02:40:35Z	-
dc.date.created	2023-09-21	-
dc.date.issued	2023-12	-
dc.identifier.issn	1566-2535	-
dc.identifier.uri	https://scholarworks.bwise.kr/gachon/handle/2020.sw.gachon/89120	-
dc.description.abstract	Recently, research on multimodal learning using all modality information has been conducted to detect disinformation on multimedia. Existing multimodal learning methods include score-level fusion approaches combining different models, and feature-level fusion methods combining embedding vectors to integrate data of different dimensions. Because a late-level fusion method is combined after the modalities are individually operated, there is a limit in that the recognition performance of a unimodal determines the performance. In addition, a fusion method has constraints in that the data among the modalities must be matched. In this study, we propose a classification system using a RoBERTa-based multimodal fusion transformer (RoBERTaMFT) that applies a co-learning method to solve the limitations of the recognition performance of multimodal learning as well as the data imbalance among the modalities. RoBERTaMFT consists of image feature extraction, co learning using the reconstruction of image features with text embedding, and a late-level fusion step applied to the final classification. As experiment results using the CrisisMMD dataset indicate, RoBERTaMFT achieved an accuracy 21.2% and an f1-score 0.414 higher than those of unimodal learning, and an accuracy 11.7% and an f1-score 0.268 higher than those of existing multimodal learning.	-
dc.language	영어	-
dc.language.iso	en	-
dc.publisher	ELSEVIER	-
dc.relation.isPartOf	INFORMATION FUSION	-
dc.title	Multimedia analysis of robustly optimized multimodal transformer based on vision and language co-learning	-
dc.type	Article	-
dc.type.rims	ART	-
dc.description.journalClass	1	-
dc.identifier.wosid	001055953800001	-
dc.identifier.doi	10.1016/j.inffus.2023.101922	-
dc.identifier.bibliographicCitation	INFORMATION FUSION, v.100	-
dc.description.isOpenAccess	N	-
dc.identifier.scopusid	2-s2.0-85165528887	-
dc.citation.title	INFORMATION FUSION	-
dc.citation.volume	100	-
dc.contributor.affiliatedAuthor	Yoon, JunHo	-
dc.contributor.affiliatedAuthor	Choi, Chang	-
dc.type.docType	Article	-
dc.subject.keywordAuthor	Multi-modal	-
dc.subject.keywordAuthor	Multimedia	-
dc.subject.keywordAuthor	Natural disasters	-
dc.subject.keywordAuthor	Classification	-
dc.subject.keywordPlus	FEATURE FUSION	-
dc.relation.journalResearchArea	Computer Science	-
dc.relation.journalWebOfScienceCategory	Computer Science, Artificial Intelligence	-
dc.relation.journalWebOfScienceCategory	Computer Science, Theory & Methods	-
dc.description.journalRegisteredClass	scie	-
dc.description.journalRegisteredClass	scopus	-

Files in This Item: There are no files associated with this item.

Appears in Collections: IT융합대학 > 컴퓨터공학과 > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Choi, Chang photo

Choi, Chang: College of IT Convergence (컴퓨터공학부(컴퓨터공학전공))

Read more

Altmetrics

Total Views & Downloads

STATISTICS: Total View :4,207,278; Today View :1,798

RSS_1.0 RSS_2.0 ATOM_1.0

1342, Seongnam-daero, Sujeong-gu, Seongnam-si, Gyeonggi-do, Republic of Korea(13120)031-750-5114

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE