Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

CTGAN-MOS: Conditional Generative Adversarial Network Based Minority-Class-Augmented Oversampling Scheme for Imbalanced Problemsopen access

Authors
Majeed, AbdulHwang, Seong Oun
Issue Date
Aug-2023
Publisher
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
Keywords
~Imbalanced problem; data augmentation; machine learning; classifiers; noise; majority class; minority class; model training; samples; intelligent fusion; data truthfulness; data engineering
Citation
IEEE ACCESS, v.11, pp.85878 - 85899
Journal Title
IEEE ACCESS
Volume
11
Start Page
85878
End Page
85899
URI
https://scholarworks.bwise.kr/gachon/handle/2020.sw.gachon/89064
DOI
10.1109/ACCESS.2023.3303509
ISSN
2169-3536
Abstract
This paper proposes a novel data augmentation scheme called the conditional generative adversarial network minority-class-augmented oversampling scheme (CTGAN-MOS) for solving class imbalance problems. Our methodology encompassed six key steps: data engineering using sophisticated pre-processing techniques, identifying the type of vulnerabilities present in the data, curating good quality synthetic data using the CTGAN model, the intelligent fusion of real and synthetic data, noise removal from the augmented data using coin-throwing algorithm, and building classifiers with the high-quality augmented data. Our scheme maintains higher structural similarity (data truthfulness) between the original and the resampled data by intelligently adding high-quality samples only to the minority class, whereas some augmentation techniques add records to the majority class, leading to poor-quality resampled data. Our scheme removes noisy samples from the data, which has remained unexplored in the CTGAN-based data augmentation. Furthermore, it augments data by adding fewer records compared to existing schemes, while offering comparable performance. Experiments are conducted on benchmark datasets to prove the feasibility of the proposed CTGAN-MOS in realistic scenarios. Results prove the improvement by CTGAN-MOS over existing state-of-the-art (SOTA) techniques in terms of accuracy, recall, precision, F1 score, and G-mean score. Specifically, the CTGAN-MOS has yielded accuracy values of 100% and 99.83% on two datasets which are higher than recent SOTA techniques. On average, it has yielded the 22.58% and 29.47% improvements w.r.t. G-mean score on two different datasets. On average, it adds 8.26% and 26.01% fewer records than the existing SOTA methods in the two datasets. Lastly, our scheme yields highly balanced confusion matrices compared to recent SOTA data augmentation techniques.
Files in This Item
There are no files associated with this item.
Appears in
Collections
IT융합대학 > 컴퓨터공학과 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher MAJEED, ABDUL photo

MAJEED, ABDUL
College of IT Convergence (컴퓨터공학부(컴퓨터공학전공))
Read more

Altmetrics

Total Views & Downloads

BROWSE