외부 지식 그래프 결합을 위한 그래프 변환기 알고리즘

안경환; 김은솔

doi:10.5626/KTCP.2024.30.11.588

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

외부 지식 그래프 결합을 위한 그래프 변환기 알고리즘A New Graph Transformer Algorithm for Leveraging External Knowledge Graph

Other Titles: A New Graph Transformer Algorithm for Leveraging External Knowledge Graph

Authors: 안경환; 김은솔

Issue Date: Nov-2024

Publisher: 한국정보과학회

Keywords: external knowledge base; knowledge graph; graph transformer; commonsense reasoning; multi-modal learning; 외부 지식 체계; 지식 그래프; 그래프 변환기; 일반 상식 추론; 다중 양상 학습

Citation: 정보과학회 컴퓨팅의 실제 논문지, v.30, no.11, pp 588 - 593

Pages: 6

Indexed: KCI

Journal Title: 정보과학회 컴퓨팅의 실제 논문지

Volume: 30

Number: 11

Start Page: 588

End Page: 593

URI: https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/211101

DOI: 10.5626/KTCP.2024.30.11.588

ISSN: 2383-6318
2383-6326

Abstract: 시각적 상식 추론은 추론 시 단순한 영상 내 객체 간의 특성이나, 관계 등 시각적 정보만을 요구하는 시각적 질문응답과 비교하여 질문 이외에 장면에 대한 맥락적 이해와 관련하여 일반 상식을 요구하는 도전적인 문제다. 본 연구에서는 일반 상식과 관련한 지식을 외부 지식 체계로부터 결합하기 위한 지식 그래프 생성 및 그래프 변환기 학습 알고리즘을 제안한다. 제안 모델에서는 외부 지식 체계인 ConceptNet으로부터 주어진 양상 정보와 관련된 지식을 검색하여 지식 그래프를 생성한다. 시각 객체와 문장 객체와 함께 지식 그래프를 정점과 간선 구분 없이 하나의 입력 단위로 그래프 변환기의 입력으로 학습한다. 본 논문에서 제안한 모델의 우수성을 입증하기 위해 시각적 상식 추론 데이터 집합을 통한 실험으로 기존 모델과 개선된 성능을 비교한다.
Visual Commonsense Reasoning(VCR) presents a more challenging problem compared to Visual Question Answering(VQA), which primarily requires understanding visual characteristics and relationships among objects within an image. In addition to the question itself, VCR necessitates a contextual comprehension of the scene and general commonsense knowledge. This paper proposes a knowledge graph construction and graph transformer learning algorithm to integrate knowledge related to general commonsense from external knowledge systems. In our proposed model, knowledge is retrieved from ConceptNet, an external knowledge system, based on the given modality information to construct a knowledge graph. This knowledge graph, along with images and text, is used as input for the graph transformer as a unified token without distinguishing between nodes and edges during training. To demonstrate the superiority of our proposed model, we conduct experiments using the VCR dataset and compare the improved performance with baseline models.

Files in This Item: Go to Link

Appears in Collections: 서울 공과대학 > 서울 컴퓨터소프트웨어학부 > 1. Journal Articles

Show full item record

qrcode

Related Researcher

Researcher Kim, Eun Sol photo

Kim, Eun Sol: COLLEGE OF ENGINEERING (SCHOOL OF COMPUTER SCIENCE)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE