Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

An improved SVM-T-RFE based on intensity-dependent normalization for feature selection in gene expression of big-data

Authors
KimC.Kim, HyeyoungH.-Y.
Issue Date
2018
Publisher
SPRINGER
Keywords
Support Vector Machine Recursive Feature Elimination (SVM-RFE); Intensity-dependent normalization (M vs A plot method); T-Statistics; RNA-seq gene expression; Big-Data
Citation
Lecture Notes in Electrical Engineering, v.449, pp.44 - 51
Journal Title
Lecture Notes in Electrical Engineering
Volume
449
Start Page
44
End Page
51
URI
https://scholarworks.bwise.kr/hongik/handle/2020.sw.hongik/13226
DOI
10.1007/978-981-10-6451-7_6
ISSN
1876-1100
Abstract
Thanks to Next-Generation-Sequencing (NGS) revolutionary, high throughput RNA sequencing data (RNA-seq) has become a highly sensitive and accurate method of measuring gene expression. Since RNA-seq generate a huge amount of data they have been struggling to overcome the lack of computational methods to exploit the enormous RNA-seq Big-Data. In most of cases, those methods have not been adequate for feature scaling scheme on RNA-seq Big Data. So, RNA-seq encourages computational biologist to identify both novel and well-known features, although it have led to an increase in an adoption of previous methods and development of newly scalable data analysis ones. And it provides recognition of some deep learning methods which are scalable and adaptable for assuming and selecting the highly correlated genes for classification and prediction. However, some assumption of those methods have not been always correct and they have been considered unstable in terms of large-scale gene expression profiling. Therefore we propose improved feature selection technique of well-known support vector machine recursive feature elimination (SVM-RFE) with T-Statistics based on Intensity-dependent normalization, which uses log differential expression ratio (M vs A plot) for improving scalability. In each iteration of SVM-RFE, less dominated feature set with respect to relevance and redundancy is excluded from this set of features. In the proposed algorithm, the most relevant and less redundant feature is included in the final feature set, accomplishing comparable accuracy with a small subsets of Big-Data, such as NCBI-GEO. The proposed algorithm is compared with the existing one on several known data. It finds that the proposed algorithm have become convenient and quick than previous because it uses all functions in R package and have more improvement with regard to the time consuming in terms of Big-Data.
Files in This Item
There are no files associated with this item.
Appears in
Collections
School of Games > Game Software Major > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Kim, Hye Young photo

Kim, Hye Young
Game (Major in Game Software)
Read more

Altmetrics

Total Views & Downloads

BROWSE