Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Enhancing Machine Learning Models Through PCA, SMOTE-ENN, and Stochastic Weighted Averaging

Full metadata record
DC Field Value Language
dc.contributor.authorHan, Youngjin-
dc.contributor.authorJoe, Inwhee-
dc.date.accessioned2024-11-28T19:00:57Z-
dc.date.available2024-11-28T19:00:57Z-
dc.date.issued2024-11-
dc.identifier.issn2076-3417-
dc.identifier.issn2076-3417-
dc.identifier.urihttps://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/198094-
dc.description.abstractPredicting survival outcomes in critical accidents has been a focal point in machine learning research. This study addresses several limitations of existing methods, including insufficient management of data imbalance, lack of emphasis on hyperparameter tuning, and proneness to overfitting. Many existing models struggle to generalize effectively on imbalanced datasets or depend on default hyperparameter settings, resulting in biased predictions. By integrating Principal Component Analysis (PCA), hyperparameter optimization, and resampling methods, as well as combining Edited Nearest Neighbors (ENN) with the Synthetic Minority Oversampling Technique (SMOTE), the model significantly improves predictive accuracy and model generalization. An ensemble model combining seven machine learning algorithms—Logistic Regression, Support Vector Machine, KNN, Random Forest, XGBoost, LightGBM, and CatBoost—was applied to predict survival outcomes. Stochastic Weighted Averaging (SWA) was applied to mitigate overfitting and enhance generalization. The accuracy increased from 91.97% to 94.89% after SWA was applied in this specific scenario. The combination of PCA-based dimensionality reduction, hyperparameter tuning, and resampling techniques (ENN + SMOTE) ensured the model handled data imbalance and optimized predictive accuracy. The final model demonstrated excellent performance, with Area Under the Curve (AUC) and Average Precision (AP) values both reaching 0.98, indicating high accuracy and precision. These improvements were validated using the Titanic dataset in a binary classification problem of predicting passenger survival. The results emphasize that ensemble learning, enhanced by SWA, offers a powerful framework for handling imbalanced and complex datasets, providing significant advancements in predictive modeling accuracy. This study provides insights into how machine learning techniques can be effectively combined to solve classification challenges in real-world scenarios.-
dc.format.extent22-
dc.language영어-
dc.language.isoENG-
dc.publisherMDPI-
dc.titleEnhancing Machine Learning Models Through PCA, SMOTE-ENN, and Stochastic Weighted Averaging-
dc.typeArticle-
dc.publisher.location스위스-
dc.identifier.doi10.3390/app14219772-
dc.identifier.scopusid2-s2.0-85208498996-
dc.identifier.wosid001351059900001-
dc.identifier.bibliographicCitationApplied Sciences-basel, v.14, no.21, pp 1 - 22-
dc.citation.titleApplied Sciences-basel-
dc.citation.volume14-
dc.citation.number21-
dc.citation.startPage1-
dc.citation.endPage22-
dc.type.docTypeArticle-
dc.description.isOpenAccessY-
dc.description.journalRegisteredClassscie-
dc.description.journalRegisteredClassscopus-
dc.relation.journalResearchAreaChemistry-
dc.relation.journalResearchAreaEngineering-
dc.relation.journalResearchAreaMaterials Science-
dc.relation.journalResearchAreaPhysics-
dc.relation.journalWebOfScienceCategoryChemistry, Multidisciplinary-
dc.relation.journalWebOfScienceCategoryEngineering, Multidisciplinary-
dc.relation.journalWebOfScienceCategoryMaterials Science, Multidisciplinary-
dc.relation.journalWebOfScienceCategoryPhysics, Applied-
dc.subject.keywordPlusAdversarial machine learning-
dc.subject.keywordPlusContrastive Learning-
dc.subject.keywordPlusDecision trees-
dc.subject.keywordPlusLogistic regression-
dc.subject.keywordPlusNetwork security-
dc.subject.keywordPlusPrediction models-
dc.subject.keywordPlusRandom forests-
dc.subject.keywordPlusSupport vector regression-
dc.subject.keywordAuthorsurvival prediction-
dc.subject.keywordAuthorhyperparameter optimization-
dc.subject.keywordAuthorensemble methods-
dc.subject.keywordAuthordata imbalance-
dc.subject.keywordAuthorPCA-
dc.subject.keywordAuthorBayesian optimization-
dc.subject.keywordAuthorvoting-
dc.subject.keywordAuthorstacking-
dc.subject.keywordAuthorSWA-
dc.identifier.urlhttps://www.mdpi.com/2076-3417/14/21/9772-
Files in This Item
Appears in
Collections
서울 공과대학 > 서울 컴퓨터소프트웨어학부 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Altmetrics

Total Views & Downloads

BROWSE