A Simple Framework for Scene Graph Reasoning with Semantic Understanding of Complex Sentence Structure

Heo, Yoonseok; Kang, Sangwoo

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

A Simple Framework for Scene Graph Reasoning with Semantic Understanding of Complex Sentence Structure

Full metadata record

DC Field	Value	Language
dc.contributor.author	Heo, Yoonseok	-
dc.contributor.author	Kang, Sangwoo	-
dc.date.accessioned	2023-09-23T02:40:29Z	-
dc.date.available	2023-09-23T02:40:29Z	-
dc.date.issued	2023-09	-
dc.identifier.issn	2227-7390	-
dc.identifier.issn	2227-7390	-
dc.identifier.uri	https://scholarworks.bwise.kr/gachon/handle/2020.sw.gachon/89133	-
dc.description.abstract	A rapidly expanding multimedia environment in recent years has led to an explosive increase in demand for multimodality that can communicate with humans in various ways. Even though the convergence of vision and language intelligence has shed light on the remarkable success over the last few years, there is still a caveat: it is unknown whether they truly understand the semantics of the image. More specifically, how they correctly capture relationships between objects represented within the image is still regarded as a black box. In order to testify whether such relationships are well understood, this work mainly focuses on the Graph-structured visual Question Answering (GQA) task which evaluates the understanding of an image by reasoning a scene graph describing the structural characteristics of an image in the form of natural language together with the image. Unlike the existing approaches that have been accompanied by an additional encoder for scene graphs, we propose a simple yet effective framework using pre-trained multimodal transformers for scene graph reasoning. Inspired by the fact that a scene graph can be regarded as a set of sentences describing two related objects with a relationship, we fuse them into the framework separately from the question. In addition, we propose a multi-task learning method that utilizes evaluating the grammatical validity of questions as an auxiliary task to better understand a question with complex structures. This utilizes the semantic role labels of the question to randomly shuffle the sentence structure of the question. We have conducted extensive experiments to evaluate the effectiveness in terms of task capabilities, ablation studies, and generalization.	-
dc.language	영어	-
dc.language.iso	ENG	-
dc.publisher	MDPI	-
dc.title	A Simple Framework for Scene Graph Reasoning with Semantic Understanding of Complex Sentence Structure	-
dc.type	Article	-
dc.identifier.wosid	001062701300001	-
dc.identifier.doi	10.3390/math11173751	-
dc.identifier.bibliographicCitation	MATHEMATICS, v.11, no.17	-
dc.description.isOpenAccess	Y	-
dc.identifier.scopusid	2-s2.0-85176390006	-
dc.citation.title	MATHEMATICS	-
dc.citation.volume	11	-
dc.citation.number	17	-
dc.type.docType	Article	-
dc.publisher.location	스위스	-
dc.subject.keywordAuthor	multimodal deep learning	-
dc.subject.keywordAuthor	scene graph reasoning	-
dc.subject.keywordAuthor	multimodal transformer	-
dc.subject.keywordAuthor	multi-task learning	-
dc.relation.journalResearchArea	Mathematics	-
dc.relation.journalWebOfScienceCategory	Mathematics	-
dc.description.journalRegisteredClass	scie	-
dc.description.journalRegisteredClass	scopus	-

Files in This Item: There are no files associated with this item.

Appears in Collections: IT융합대학 > 소프트웨어학과 > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Kang, Sang Woo photo

Kang, Sang Woo: College of IT Convergence (Department of Software)

Read more

Altmetrics

Total Views & Downloads

STATISTICS: Total View :4,230,992; Today View :2,175

RSS_1.0 RSS_2.0 ATOM_1.0

1342, Seongnam-daero, Sujeong-gu, Seongnam-si, Gyeonggi-do, Republic of Korea(13120)031-750-5114

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE