Image-Text Sentiment Analysis Model Based on Visual Aspect Attention

Daniel James; Lee, Seung Hyun; Lee, Won Hyung

doi:10.22819/kscg.2021.34.4.013

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Image-Text Sentiment Analysis Model Based on Visual Aspect AttentionImage-Text Sentiment Analysis Model Based on Visual Aspect Attention

Authors: Daniel James; Lee, Seung Hyun; Lee, Won Hyung

Issue Date: Dec-2021

Publisher: (사)한국컴퓨터게임학회

Keywords: Visual aspect attention; LSTM; Multi-model; Sentiment analysis; Social images

Citation: 한국컴퓨터게임학회논문지, v.34, no.4, pp 125 - 137

Pages: 13

Journal Title: 한국컴퓨터게임학회논문지

Volume: 34

Number: 4

Start Page: 125

End Page: 137

URI: https://scholarworks.bwise.kr/cau/handle/2019.sw.cau/53116

DOI: 10.22819/kscg.2021.34.4.013

ISSN: 1976-6513

Abstract: Social network has become an integral part of our daily life. Sentiment analysis of social media information is helpful to understand people's views, attitudes and emotions on social networking sites. Traditional sentiment analysis mainly relies on text. With the rise of smart phones, information on the network is gradually diversified, including not only text, but also images. It is found that, in many cases, images can enhance the text rather than express emotions independently. We propose a novel image text sentiment analysis model (LSTM-VAA). Specifically, this model does not take the picture information as the direct input, but uses the VGG16 network to extract the image features, and then generates the visual aspect attention, and gives the core sentences in the document a higher weight, and get a document representation based on the visual aspect attention. In addition, we use the LSTM network to extract the text sentiment and get the document representation based on text only. Finally, we fuse the two groups of classification results to obtain the final classification label. On the yelp restaurant reviews data set, our model achieves an accuracy of 62.08%, which is 18.92% higher than BiGRU-m VGG, which verifies the effectiveness of using visual information as aspect attention assisted text for emotion classification; It is 0.32% higher than Vista-Net model, which proves that LSTM model can effectively make up for the defect that images in Vista-Net model cannot completely cover text.

Files in This Item: There are no files associated with this item.

Appears in Collections: Graduate School of Advanced Imaging Sciences, Multimedia and Film > Department of Imaging Science and Arts > 1. Journal Articles

Show full item record

qrcode

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

84, Heukseok-ro, Dongjak-gu, Seoul, Republic of Korea (06974)02-820-6194

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Altmetrics

Total Views & Downloads

BROWSE