CTGAN-MOS: Conditional Generative Adversarial Network Based Minority-Class-Augmented Oversampling Scheme for Imbalanced Problems

Majeed, Abdul; Hwang, Seong Oun

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

CTGAN-MOS: Conditional Generative Adversarial Network Based Minority-Class-Augmented Oversampling Scheme for Imbalanced Problemsopen access

Authors: Majeed, Abdul; Hwang, Seong Oun

Issue Date: Aug-2023

Publisher: IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords: ~Imbalanced problem; data augmentation; machine learning; classifiers; noise; majority class; minority class; model training; samples; intelligent fusion; data truthfulness; data engineering

Citation: IEEE ACCESS, v.11, pp.85878 - 85899

Journal Title: IEEE ACCESS

Volume: 11

Start Page: 85878

End Page: 85899

URI: https://scholarworks.bwise.kr/gachon/handle/2020.sw.gachon/89064

DOI: 10.1109/ACCESS.2023.3303509

ISSN: 2169-3536

Abstract: This paper proposes a novel data augmentation scheme called the conditional generative adversarial network minority-class-augmented oversampling scheme (CTGAN-MOS) for solving class imbalance problems. Our methodology encompassed six key steps: data engineering using sophisticated pre-processing techniques, identifying the type of vulnerabilities present in the data, curating good quality synthetic data using the CTGAN model, the intelligent fusion of real and synthetic data, noise removal from the augmented data using coin-throwing algorithm, and building classifiers with the high-quality augmented data. Our scheme maintains higher structural similarity (data truthfulness) between the original and the resampled data by intelligently adding high-quality samples only to the minority class, whereas some augmentation techniques add records to the majority class, leading to poor-quality resampled data. Our scheme removes noisy samples from the data, which has remained unexplored in the CTGAN-based data augmentation. Furthermore, it augments data by adding fewer records compared to existing schemes, while offering comparable performance. Experiments are conducted on benchmark datasets to prove the feasibility of the proposed CTGAN-MOS in realistic scenarios. Results prove the improvement by CTGAN-MOS over existing state-of-the-art (SOTA) techniques in terms of accuracy, recall, precision, F1 score, and G-mean score. Specifically, the CTGAN-MOS has yielded accuracy values of 100% and 99.83% on two datasets which are higher than recent SOTA techniques. On average, it has yielded the 22.58% and 29.47% improvements w.r.t. G-mean score on two different datasets. On average, it adds 8.26% and 26.01% fewer records than the existing SOTA methods in the two datasets. Lastly, our scheme yields highly balanced confusion matrices compared to recent SOTA data augmentation techniques.

Files in This Item: There are no files associated with this item.

Appears in Collections: IT융합대학 > 컴퓨터공학과 > 1. Journal Articles

Show full item record

qrcode

Related Researcher

Researcher MAJEED, ABDUL photo

MAJEED, ABDUL: College of IT Convergence (컴퓨터공학부(컴퓨터공학전공))

Read more

Altmetrics

Total Views & Downloads

STATISTICS: Total View :4,232,632; Today View :5,436

RSS_1.0 RSS_2.0 ATOM_1.0

1342, Seongnam-daero, Sujeong-gu, Seongnam-si, Gyeonggi-do, Republic of Korea(13120)031-750-5114

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE