Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Identification of Secondary Breast Cancer in Vital Organs through the Integration of Machine Learning and Microarraysopen access

Authors
Riaz, FaisalAbid, FazeelDin, Ikram UdKim, Byung-SeoAlmogren, AhmadUl Durar, Shajara
Issue Date
2-Jun-2022
Publisher
MDPI
Keywords
metastasis; microarray; gene expression omnibus; decision trees; random forest; K-nearest neighbours; support vector machine; K-means SMOTE
Citation
ELECTRONICS, v.11, no.12
Journal Title
ELECTRONICS
Volume
11
Number
12
URI
https://scholarworks.bwise.kr/hongik/handle/2020.sw.hongik/30085
DOI
10.3390/electronics11121879
ISSN
2079-9292
Abstract
Breast cancer includes genetic and environmental factors and is the most prevalent malignancy in women contributing to the pathogenesis and progression of cancer. Breast cancer prognosis metastasizes towards bones, the liver, brain, and lungs, and is the main cause of death in patients. Furthermore, the selection of features and classification is significant in microarray data analysis, which suffers from huge time consumption. To address these issues, this research uniquely integrates machine learning and microarrays to identify secondary breast cancer in vital organs. This work firstly imputes the missing values using K-nearest neighbors and improves the recursive feature elimination with cross-validation (RFECV) using the random forest method. Secondly, the class imbalance is handled by employing K-means synthetic object oversampling technique (SMOTE) to balance minority class and prevent noise. We successfully identified the 16 most essential Entrez gene ids responsible for predicting metastatic locations in the bones, brain, liver, and lungs. Extensive experiments are conducted on NCBI Gene Expression Omnibus GSE14020 and GSE54323 datasets. The proposed methods have handled class imbalance, prevented noise, and appropriately reduced time consumption. Reliable results were obtained on four classification models: decision tree; K-nearest neighbors; random forest; and support vector machine. Results are presented having considered confusion matrices, accuracy, ROC-AUC and PR-AUC, and F1-score.
Files in This Item
There are no files associated with this item.
Appears in
Collections
Graduate School > Software and Communications Engineering > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Kim, Byung Seo photo

Kim, Byung Seo
Graduate School (Software and Communications Engineering)
Read more

Altmetrics

Total Views & Downloads

BROWSE