Detailed Information

Cited 0 time in webofscience Cited 48 time in scopus
Metadata Downloads

The feature selection method based on genetic algorithm for efficient of text clustering and text classification

Authors
Hong, S.-S.Lee, W.Han, M.-M.
Issue Date
2015
Publisher
International Center for Scientific Research and Studies
Keywords
Big data; Feature selection; Genetic algorithm; Text clustering; Text mining
Citation
International Journal of Advances in Soft Computing and its Applications, v.7, no.1, pp.22 - 40
Journal Title
International Journal of Advances in Soft Computing and its Applications
Volume
7
Number
1
Start Page
22
End Page
40
URI
https://scholarworks.bwise.kr/gachon/handle/2020.sw.gachon/11004
ISSN
2074-8523
Abstract
Big Data means a very large amount of data and includes a range of methodologies such as big data collection, processing, storage, management, and analysis. Since Big Data Text Mining extracts a lot of features and data, clustering and classification can result in high computational complexity and the low reliability of the analysis results. In particular, a TDM (Term Document Matrix) obtained through text mining represents term-document features but features a sparse matrix. In this paper, the study focuses on selecting a set of optimized features from the corpus. A Genetic Algorithm (GA) is used to extract terms (features) as desired according to term importance calculated by the equation found. The study revolves around feature selection method to lower computational complexity and to increase analytical performance.We designed a new genetic algorithm to extract features in text mining. TF-IDF is used to reflect document-term relationships in feature extraction. Through the repetitive process, features are selected as many as the predetermined number. We have conducted clustering experiments on a set of spammail documents to verify and to improve feature selection performance. And we found that the proposal FSGA algorithm shown better performance of Text Clustering and Classification than using all of features.
Files in This Item
There are no files associated with this item.
Appears in
Collections
IT융합대학 > 소프트웨어학과 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Han, Myung Mook photo

Han, Myung Mook
IT (Department of Software)
Read more

Altmetrics

Total Views & Downloads

BROWSE