Improving K Nearest Neighbor into String Vector Version for Text Categorization
- Authors
- Jo, Taeho
- Issue Date
- 2019
- Publisher
- IEEE
- Keywords
- String Vector; K Nearest Neighbor; Text Categorization
- Citation
- 2019 21ST INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION TECHNOLOGY (ICACT): ICT FOR 4TH INDUSTRIAL REVOLUTION, pp.1091 - 1097
- Journal Title
- 2019 21ST INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION TECHNOLOGY (ICACT): ICT FOR 4TH INDUSTRIAL REVOLUTION
- Start Page
- 1091
- End Page
- 1097
- URI
- https://scholarworks.bwise.kr/hongik/handle/2020.sw.hongik/28050
- ISSN
- 1738-9445
- Abstract
- This research is concerned with the string vector based version of the KNN which is the approach to the text categorization. Traditionally, texts have been encoded into numerical vectors for using the traditional version of KNN, and encoding so leads to the three main problems: huge dimensionality, sparse distribution, and poor transparency. In order to solve the problems, this research propose that texts should be encoded into string vectors the similarity measure between string vectors is defined, and the KNN is modified into the version where string vector is given its input. The proposed KNN version is validated empirically by comparing it with the traditional KNN version on the three collections: NewsPage.com, Opiniopsis, and 20NewsGroups. The goal of this research is to improve the text categorization performance by solving them.
- Files in This Item
- There are no files associated with this item.
- Appears in
Collections - School of Games > Game Software Major > 1. Journal Articles
![qrcode](https://api.qrserver.com/v1/create-qr-code/?size=55x55&data=https://scholarworks.bwise.kr/hongik/handle/2020.sw.hongik/28050)
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.