루간다어 감성 분류를 위한 저자원 유튜브 댓글 인코딩Low-resource YouTube comment encoding for Luganda sentiment classification performance
- Other Titles
- Low-resource YouTube comment encoding for Luganda sentiment classification performance
- Authors
- Abdul Male Ssentumbwe; 정유철; 이현아; 김병만
- Issue Date
- May-2020
- Publisher
- 한국디지털콘텐츠학회
- Keywords
- Luganda; Low-resource language; Sentiment Analysis; YouTube Comments; Opinion Mining; Luganda; 저자원 언어; 감성분석; 유튜브 댓글; 의견 마이닝
- Citation
- 디지털콘텐츠학회논문지, v.21, no.5, pp.951 - 958
- Journal Title
- 디지털콘텐츠학회논문지
- Volume
- 21
- Number
- 5
- Start Page
- 951
- End Page
- 958
- URI
- https://scholarworks.bwise.kr/kumoh/handle/2020.sw.kumoh/20275
- DOI
- 10.9728/dcs.2020.21.5.951
- ISSN
- 1598-2009
- Abstract
- The recent boom in social networks usage has generated some multilingual opinion data for low-resource languages. Luganda is one of the major languages in Uganda, thus it is a low-resource language and Luganda corpora for sentiment analysis especially for YouTube is not easily available. In this paper, we propose assumptions to guide collection of Luganda comments using Luganda YouTube video opinions for sentiment analysis. We evaluate the suitability of our clean YouTube comments (158) dataset for sentiment analysis using selected machine learning and deep learning classification algorithms. Given the low-resource setting, the dataset performs best with Gaussian Naive Bayes for machine learning (55%) and deep learning Multilayer Perceptron sequential model scoring (68.8%) when dataset splitting is at 10% for test set with Luganda comment segmentation.
- Files in This Item
- There are no files associated with this item.
- Appears in
Collections - Department of Computer Engineering > 1. Journal Articles
- Department of Computer Software Engineering > 1. Journal Articles
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.