Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

A Study on the Features Selection Algorithm Based on the Measurement Method of the Distance Between Normal Distributions for Classification in Machine Learning

Full metadata record
DC Field Value Language
dc.contributor.authorShin, Byungju-
dc.contributor.authorKim, Minwoo-
dc.contributor.authorWang, Bohyun-
dc.contributor.authorLim, Joon S.-
dc.date.accessioned2022-05-25T08:40:07Z-
dc.date.available2022-05-25T08:40:07Z-
dc.date.created2022-05-25-
dc.date.issued2022-04-
dc.identifier.issn1330-3651-
dc.identifier.urihttps://scholarworks.bwise.kr/gachon/handle/2020.sw.gachon/84431-
dc.description.abstractFeature selection is an important technique that simplifies machine learning models to easily understand, shorten learning time, and reduce curve over-fitting or under-fitting. This paper presents a shape selection algorithm based on a method of investigating similarities between sampled shape values for classification variables (classes). This is based on the premise that the lower the similarity, the higher the usefulness of class classification. The confidence interval of a normal distribution is used to measure similarity. It is judged that the more overlapping the confidence intervals, the higher the similarity. The smaller the duplication of the confidence interval, the lower the similarity, and if the similarity is low, it can be used as a criterion for classification. Therefore, I propose an equation to apply this method. To confirm the usefulness of the equation, a colorectal cancer dataset with about 2000 genes was used and comparative experiments were performed with other feature selection algorithms. The comparison algorithms were Gini Index (10 features), mRMR (10 features), and relational matrix algorithms (7 features). Artificial neural networks were generally used as machine learning algorithms, and comparative verification was performed based on the rib one-out cross-validation method. As a result of the experiment, the results of the Gini index (85.487%), mRMR (87.09%), and relational matrix algorithms (87.09%) were better than those of 88.71% by selecting 10 features. In addition, experiments on iris, wine, glass, music emotions, seeds, and Japanese collection datasets were conducted on multiple classification problems. In the case of wine, the accuracy was 98.8% when all functions were used, but six functions were removed, resulting in 99.4% accuracy. In the case of music sensitivity, the accuracy was 51.7% when all 54 features were used, but when 20 features were removed, it improved to 61.3%. In the case of seeds, it was found that when the number of seeds decreased from 7 to 5, it slightly improved from 93.3% to 93.8%. In the case of iris, glass, and Japanese vowels, the accuracy did not increase even though the function was removed. Therefore, it can be concluded that features can be easily and effectively selected from the multi-class classification problem using the method proposed in this paper.-
dc.language영어-
dc.language.isoen-
dc.publisherUNIV OSIJEK, TECH FAC-
dc.relation.isPartOfTEHNICKI VJESNIK-TECHNICAL GAZETTE-
dc.titleA Study on the Features Selection Algorithm Based on the Measurement Method of the Distance Between Normal Distributions for Classification in Machine Learning-
dc.typeArticle-
dc.type.rimsART-
dc.description.journalClass1-
dc.identifier.wosid000790165900017-
dc.identifier.doi10.17559/TV-20211102113116-
dc.identifier.bibliographicCitationTEHNICKI VJESNIK-TECHNICAL GAZETTE, v.29, no.3, pp.852 - 860-
dc.description.isOpenAccessY-
dc.identifier.scopusid2-s2.0-85129367179-
dc.citation.endPage860-
dc.citation.startPage852-
dc.citation.titleTEHNICKI VJESNIK-TECHNICAL GAZETTE-
dc.citation.volume29-
dc.citation.number3-
dc.contributor.affiliatedAuthorWang, Bohyun-
dc.contributor.affiliatedAuthorLim, Joon S.-
dc.type.docTypeArticle-
dc.subject.keywordAuthorclassification-
dc.subject.keywordAuthordistance-
dc.subject.keywordAuthorfeature selection-
dc.subject.keywordAuthorGaussian distribution-
dc.subject.keywordAuthorsimilarity-
dc.subject.keywordPlusBREAST-CANCER DIAGNOSIS-
dc.subject.keywordPlusPROGNOSIS-
dc.relation.journalResearchAreaEngineering-
dc.relation.journalWebOfScienceCategoryEngineering, Multidisciplinary-
dc.description.journalRegisteredClassscie-
dc.description.journalRegisteredClassscopus-
Files in This Item
There are no files associated with this item.
Appears in
Collections
IT융합대학 > 컴퓨터공학과 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Lim, Joon Shik photo

Lim, Joon Shik
College of IT Convergence (컴퓨터공학부(컴퓨터공학전공))
Read more

Altmetrics

Total Views & Downloads

BROWSE