Less is More: An Empirical Study of Undersampling Techniques for Technical Debt Prediction
- Authors
- Lee, Gichan; Lee, Scott Uk-Jin
- Issue Date
- Sep-2025
- Publisher
- Springer Science and Business Media Deutschland GmbH
- Keywords
- Class Imbalance; Technical Debt; Undersampling
- Citation
- Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), v.14784 LNCS, pp 146 - 156
- Pages
- 11
- Indexed
- SCOPUS
- Journal Title
- Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
- Volume
- 14784 LNCS
- Start Page
- 146
- End Page
- 156
- URI
- https://scholarworks.bwise.kr/erica/handle/2021.sw.erica/120724
- DOI
- 10.1007/978-3-031-66456-4_8
- ISSN
- 0302-9743
1611-3349
- Abstract
- Technical Debt (TD) prediction is crucial to preventing software quality degradation and maintenance cost increase. Recent Machine Learning (ML) approaches have shown promising results in TD prediction, but the imbalanced TD datasets can have a negative impact on ML model performance. Although previous TD studies have investigated various oversampling techniques that generates minority class instances to mitigate the imbalance, potentials of undersampling techniques have not yet been thoroughly explored due to the concerns about information loss. To address this gap, we investigate the impact of undersampling on TD model performance by utilizing 17,797 classes from 25 Java open-source projects. We compare the performance of the models with different undersampling techniques and evaluate the impact of combining them with widely-used oversampling techniques. Our findings reveal that (i) undersampling can significantly improve TD model performance compared to oversampling and no resampling; (ii) the combined application of undersampling and oversampling techniques leads to a synergy of further performance improvement compared to applying each technique exclusively. Based on these results, we recommend practitioners to explore various undersampling techniques and their combinations with oversampling techniques for more effective TD prediction. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
- Files in This Item
-
Go to Link
- Appears in
Collections - COLLEGE OF COMPUTING > ERICA 컴퓨터학부 > 1. Journal Articles

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.