Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Enhancing Machine Learning Models Through PCA, SMOTE-ENN, and Stochastic Weighted Averagingopen access

Authors
Han, YoungjinJoe, Inwhee
Issue Date
Nov-2024
Publisher
MDPI
Keywords
survival prediction; hyperparameter optimization; ensemble methods; data imbalance; PCA; Bayesian optimization; voting; stacking; SWA
Citation
Applied Sciences-basel, v.14, no.21, pp 1 - 22
Pages
22
Indexed
SCIE
SCOPUS
Journal Title
Applied Sciences-basel
Volume
14
Number
21
Start Page
1
End Page
22
URI
https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/198094
DOI
10.3390/app14219772
ISSN
2076-3417
2076-3417
Abstract
Predicting survival outcomes in critical accidents has been a focal point in machine learning research. This study addresses several limitations of existing methods, including insufficient management of data imbalance, lack of emphasis on hyperparameter tuning, and proneness to overfitting. Many existing models struggle to generalize effectively on imbalanced datasets or depend on default hyperparameter settings, resulting in biased predictions. By integrating Principal Component Analysis (PCA), hyperparameter optimization, and resampling methods, as well as combining Edited Nearest Neighbors (ENN) with the Synthetic Minority Oversampling Technique (SMOTE), the model significantly improves predictive accuracy and model generalization. An ensemble model combining seven machine learning algorithms—Logistic Regression, Support Vector Machine, KNN, Random Forest, XGBoost, LightGBM, and CatBoost—was applied to predict survival outcomes. Stochastic Weighted Averaging (SWA) was applied to mitigate overfitting and enhance generalization. The accuracy increased from 91.97% to 94.89% after SWA was applied in this specific scenario. The combination of PCA-based dimensionality reduction, hyperparameter tuning, and resampling techniques (ENN + SMOTE) ensured the model handled data imbalance and optimized predictive accuracy. The final model demonstrated excellent performance, with Area Under the Curve (AUC) and Average Precision (AP) values both reaching 0.98, indicating high accuracy and precision. These improvements were validated using the Titanic dataset in a binary classification problem of predicting passenger survival. The results emphasize that ensemble learning, enhanced by SWA, offers a powerful framework for handling imbalanced and complex datasets, providing significant advancements in predictive modeling accuracy. This study provides insights into how machine learning techniques can be effectively combined to solve classification challenges in real-world scenarios.
Files in This Item
Appears in
Collections
서울 공과대학 > 서울 컴퓨터소프트웨어학부 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Altmetrics

Total Views & Downloads

BROWSE