The feature selection method based on genetic algorithm for efficient of text clustering and text classification
- Authors
- Hong, S.-S.; Lee, W.; Han, M.-M.
- Issue Date
- 2015
- Publisher
- International Center for Scientific Research and Studies
- Keywords
- Big data; Feature selection; Genetic algorithm; Text clustering; Text mining
- Citation
- International Journal of Advances in Soft Computing and its Applications, v.7, no.1, pp.22 - 40
- Journal Title
- International Journal of Advances in Soft Computing and its Applications
- Volume
- 7
- Number
- 1
- Start Page
- 22
- End Page
- 40
- URI
- https://scholarworks.bwise.kr/gachon/handle/2020.sw.gachon/11004
- ISSN
- 2074-8523
- Abstract
- Big Data means a very large amount of data and includes a range of methodologies such as big data collection, processing, storage, management, and analysis. Since Big Data Text Mining extracts a lot of features and data, clustering and classification can result in high computational complexity and the low reliability of the analysis results. In particular, a TDM (Term Document Matrix) obtained through text mining represents term-document features but features a sparse matrix. In this paper, the study focuses on selecting a set of optimized features from the corpus. A Genetic Algorithm (GA) is used to extract terms (features) as desired according to term importance calculated by the equation found. The study revolves around feature selection method to lower computational complexity and to increase analytical performance.We designed a new genetic algorithm to extract features in text mining. TF-IDF is used to reflect document-term relationships in feature extraction. Through the repetitive process, features are selected as many as the predetermined number. We have conducted clustering experiments on a set of spammail documents to verify and to improve feature selection performance. And we found that the proposal FSGA algorithm shown better performance of Text Clustering and Classification than using all of features.
- Files in This Item
- There are no files associated with this item.
- Appears in
Collections - IT융합대학 > 소프트웨어학과 > 1. Journal Articles
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.