Less is More: An Empirical Study of Undersampling Techniques for Technical Debt Prediction
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Lee, Gichan | - |
dc.contributor.author | Lee, Scott Uk-Jin | - |
dc.date.accessioned | 2024-11-01T02:00:19Z | - |
dc.date.available | 2024-11-01T02:00:19Z | - |
dc.date.issued | 2025-09 | - |
dc.identifier.issn | 0302-9743 | - |
dc.identifier.issn | 1611-3349 | - |
dc.identifier.uri | https://scholarworks.bwise.kr/erica/handle/2021.sw.erica/120724 | - |
dc.description.abstract | Technical Debt (TD) prediction is crucial to preventing software quality degradation and maintenance cost increase. Recent Machine Learning (ML) approaches have shown promising results in TD prediction, but the imbalanced TD datasets can have a negative impact on ML model performance. Although previous TD studies have investigated various oversampling techniques that generates minority class instances to mitigate the imbalance, potentials of undersampling techniques have not yet been thoroughly explored due to the concerns about information loss. To address this gap, we investigate the impact of undersampling on TD model performance by utilizing 17,797 classes from 25 Java open-source projects. We compare the performance of the models with different undersampling techniques and evaluate the impact of combining them with widely-used oversampling techniques. Our findings reveal that (i) undersampling can significantly improve TD model performance compared to oversampling and no resampling; (ii) the combined application of undersampling and oversampling techniques leads to a synergy of further performance improvement compared to applying each technique exclusively. Based on these results, we recommend practitioners to explore various undersampling techniques and their combinations with oversampling techniques for more effective TD prediction. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025. | - |
dc.format.extent | 11 | - |
dc.language | 영어 | - |
dc.language.iso | ENG | - |
dc.publisher | Springer Science and Business Media Deutschland GmbH | - |
dc.title | Less is More: An Empirical Study of Undersampling Techniques for Technical Debt Prediction | - |
dc.type | Article | - |
dc.publisher.location | 미국 | - |
dc.identifier.doi | 10.1007/978-3-031-66456-4_8 | - |
dc.identifier.scopusid | 2-s2.0-85206189314 | - |
dc.identifier.wosid | 001345175300008 | - |
dc.identifier.bibliographicCitation | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), v.14784 LNCS, pp 146 - 156 | - |
dc.citation.title | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) | - |
dc.citation.volume | 14784 LNCS | - |
dc.citation.startPage | 146 | - |
dc.citation.endPage | 156 | - |
dc.type.docType | Proceedings Paper | - |
dc.description.isOpenAccess | N | - |
dc.description.journalRegisteredClass | scopus | - |
dc.relation.journalResearchArea | Computer Science | - |
dc.relation.journalWebOfScienceCategory | Computer Science, Artificial Intelligence | - |
dc.relation.journalWebOfScienceCategory | Computer Science, Interdisciplinary Applications | - |
dc.relation.journalWebOfScienceCategory | Computer Science, Software Engineering | - |
dc.relation.journalWebOfScienceCategory | Computer Science, Theory & Methods | - |
dc.subject.keywordAuthor | Class Imbalance | - |
dc.subject.keywordAuthor | Technical Debt | - |
dc.subject.keywordAuthor | Undersampling | - |
dc.identifier.url | https://link.springer.com/chapter/10.1007/978-3-031-66456-4_8 | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
55 Hanyangdeahak-ro, Sangnok-gu, Ansan, Gyeonggi-do, 15588, Korea+82-31-400-4269 sweetbrain@hanyang.ac.kr
COPYRIGHT © 2021 HANYANG UNIVERSITY. ALL RIGHTS RESERVED.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.