Content Noise Detection Model Using Deep Learning in Web Forums
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Woo, Jiyoung | - |
dc.contributor.author | Yun, Jaeseok | - |
dc.date.accessioned | 2021-08-11T08:35:53Z | - |
dc.date.available | 2021-08-11T08:35:53Z | - |
dc.date.issued | 2020-06 | - |
dc.identifier.issn | 2071-1050 | - |
dc.identifier.uri | https://scholarworks.bwise.kr/sch/handle/2021.sw.sch/2777 | - |
dc.description.abstract | Spam posts in web forum discussions cause user inconvenience and lower the value of the web forum as an open source of user opinion. In this regard, as the importance of a web post is evaluated in terms of the number of involved authors, noise distorts the analysis results by adding unnecessary data to the opinion analysis. Here, in this work, an automatic detection model for spam posts in web forums using both conventional machine learning and deep learning is proposed. To automatically differentiate between normal posts and spam, evaluators were asked to recognize spam posts in advance. To construct the machine learning-based model, text features from posted content using text mining techniques from the perspective of linguistics were extracted, and supervised learning was performed to distinguish content noise from normal posts. For the deep learning model, raw text including and excluding special characters was utilized. A comparison analysis on deep neural networks using the two different recurrent neural network (RNN) models of the simple RNN and long short-term memory (LSTM) network was also performed. Furthermore, the proposed model was applied to two web forums. The experimental results indicate that the deep learning model affords significant improvements over the accuracy of conventional machine learning associated with text features. The accuracy of the proposed model using LSTM reaches 98.56%, and the precision and recall of the noise class reach 99% and 99.53%, respectively. | - |
dc.language | 영어 | - |
dc.language.iso | ENG | - |
dc.publisher | MDPI Open Access Publishing | - |
dc.title | Content Noise Detection Model Using Deep Learning in Web Forums | - |
dc.type | Article | - |
dc.publisher.location | 스위스 | - |
dc.identifier.doi | 10.3390/su12125074 | - |
dc.identifier.scopusid | 2-s2.0-85086917903 | - |
dc.identifier.wosid | 000550330900001 | - |
dc.identifier.bibliographicCitation | Sustainability, v.12, no.12 | - |
dc.citation.title | Sustainability | - |
dc.citation.volume | 12 | - |
dc.citation.number | 12 | - |
dc.type.docType | Article | - |
dc.description.isOpenAccess | Y | - |
dc.description.journalRegisteredClass | scie | - |
dc.description.journalRegisteredClass | ssci | - |
dc.description.journalRegisteredClass | scopus | - |
dc.relation.journalResearchArea | Science & Technology - Other Topics | - |
dc.relation.journalResearchArea | Environmental Sciences & Ecology | - |
dc.relation.journalWebOfScienceCategory | Green & Sustainable Science & Technology | - |
dc.relation.journalWebOfScienceCategory | Environmental Sciences | - |
dc.relation.journalWebOfScienceCategory | Environmental Studies | - |
dc.subject.keywordPlus | SPAM DETECTION | - |
dc.subject.keywordAuthor | web forum | - |
dc.subject.keywordAuthor | social media | - |
dc.subject.keywordAuthor | content noise | - |
dc.subject.keywordAuthor | posting quality | - |
dc.subject.keywordAuthor | text mining | - |
dc.subject.keywordAuthor | deep learning | - |
dc.subject.keywordAuthor | machine learning | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
(31538) 22, Soonchunhyang-ro, Asan-si, Chungcheongnam-do, Republic of Korea+82-41-530-1114
COPYRIGHT 2021 by SOONCHUNHYANG UNIVERSITY ALL RIGHTS RESERVED.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.