Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Less is More: An Empirical Study of Undersampling Techniques for Technical Debt Prediction

Authors
Lee, GichanLee, Scott Uk-Jin
Issue Date
Sep-2025
Publisher
Springer Science and Business Media Deutschland GmbH
Keywords
Class Imbalance; Technical Debt; Undersampling
Citation
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), v.14784 LNCS, pp 146 - 156
Pages
11
Indexed
SCOPUS
Journal Title
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume
14784 LNCS
Start Page
146
End Page
156
URI
https://scholarworks.bwise.kr/erica/handle/2021.sw.erica/120724
DOI
10.1007/978-3-031-66456-4_8
ISSN
0302-9743
1611-3349
Abstract
Technical Debt (TD) prediction is crucial to preventing software quality degradation and maintenance cost increase. Recent Machine Learning (ML) approaches have shown promising results in TD prediction, but the imbalanced TD datasets can have a negative impact on ML model performance. Although previous TD studies have investigated various oversampling techniques that generates minority class instances to mitigate the imbalance, potentials of undersampling techniques have not yet been thoroughly explored due to the concerns about information loss. To address this gap, we investigate the impact of undersampling on TD model performance by utilizing 17,797 classes from 25 Java open-source projects. We compare the performance of the models with different undersampling techniques and evaluate the impact of combining them with widely-used oversampling techniques. Our findings reveal that (i) undersampling can significantly improve TD model performance compared to oversampling and no resampling; (ii) the combined application of undersampling and oversampling techniques leads to a synergy of further performance improvement compared to applying each technique exclusively. Based on these results, we recommend practitioners to explore various undersampling techniques and their combinations with oversampling techniques for more effective TD prediction. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
Files in This Item
Go to Link
Appears in
Collections
COLLEGE OF COMPUTING > ERICA 컴퓨터학부 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Lee, Scott Uk Jin photo

Lee, Scott Uk Jin
ERICA 소프트웨어융합대학 (ERICA 컴퓨터학부)
Read more

Altmetrics

Total Views & Downloads

BROWSE