시각적 관계 예측을 위한 계산 효율적인 조합적 전이 표현 학습법

허유정; 김은솔; 최우석; 온경운; 장병탁

doi:10.5626/JOK.2022.49.7.544

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

시각적 관계 예측을 위한 계산 효율적인 조합적 전이 표현 학습법Efficient Compositional Translation Embedding for Visual Relationship Detection

Other Titles: Efficient Compositional Translation Embedding for Visual Relationship Detection

Authors: 허유정; 김은솔; 최우석; 온경운; 장병탁

Issue Date: Jul-2022

Publisher: 한국정보과학회

Keywords: 장면 그래프 생성; 시각적 관계 예측; 이미지 캡션 검색; 전이 표현; scene graph generation; visual relationship detection; image caption retrieval; translation embedding

Citation: 정보과학회논문지, v.49, no.7, pp 544 - 554

Pages: 11

Indexed: KCI

Journal Title: 정보과학회논문지

Volume: 49

Number: 7

Start Page: 544

End Page: 554

URI: https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/194545

DOI: 10.5626/JOK.2022.49.7.544

ISSN: 2383-630X
2383-6296

Abstract: 장면 그래프는 이미지에 존재하는 객체 사이의 고차원 시각 관계를 표현하기 위해 널리 활용된다. 본 논문에서는 장면 그래프를 자동으로 구축하기위해 객체 사이의 시각 관계를 감지하고 그 관계를 술어로 예측하는 알고리즘을 제안한다. 우리는 기존에 제시된 텍스트 기반 지식 그래프 임베딩 TransR에서 영감을 받아 i) 시각적 관계의 구성적 관점을 고려하기 위한 잠재 관계 부분 공간을 정의하고 ii) 각 부분 공간에서 객체 표현 사이의 전이적 제약을 적용하는 CompTransR을 제시한다. 장면 그래프 생성을 위한 대표적인 벤치마크 데이터인 VRD, VG200 및 VrR-VG에서 제안하는 방법론은 기제시된 모델과 비교하여 학습 복잡도를 줄이는 동시에 우수한 성능을 보였다. 또한, 높은 수준의 시각-언어 추론을 요구하는 문제 중 하나인 이미지 캡션 검색에 장면 그래프가 효과적으로 적용될 수 있음을 보이고, 제안하는 알고리즘으로 예측된 술어 표현이 검색 성능을 높이는데 도움이 됨을 확인하였다.
Scene graphs are widely used to express high-order visual relationships between objects present in an image. To generate the scene graph automatically, we propose an algorithm that detects visual relationships between objects and predicts the relationship as a predicate. Inspired by the well-known knowledge graph embedding method TransR, we present the CompTransR algorithm that i) defines latent relational subspaces considering the compositional perspective of visual relationships and ii) encodes predicate representations by applying transitive constraints between the object representations in each subspace. Our proposed model not only reduces computational complexity but also outperformed previous state-of-the-art performance in predicate detection tasks in three benchmark datasets: VRD, VG200, and VrR-VG. We also showed that a scene graph could be applied to the image-caption retrieval task, which is one of the high-level visual reasoning tasks, and the scene graph generated by our model increased retrieval performance.

Files in This Item: Go to Link

Appears in Collections: 서울 공과대학 > 서울 컴퓨터소프트웨어학부 > 1. Journal Articles

Show full item record

qrcode

Related Researcher

Researcher Kim, Eun Sol photo

Kim, Eun Sol: COLLEGE OF ENGINEERING (SCHOOL OF COMPUTER SCIENCE)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE