Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap
- Authors
- Kim, Ji-Hyun
- Issue Date
- 1-Sep-2009
- Publisher
- ELSEVIER SCIENCE BV
- Citation
- COMPUTATIONAL STATISTICS & DATA ANALYSIS, v.53, no.11, pp.3735 - 3745
- Journal Title
- COMPUTATIONAL STATISTICS & DATA ANALYSIS
- Volume
- 53
- Number
- 11
- Start Page
- 3735
- End Page
- 3745
- URI
- http://scholarworks.bwise.kr/ssu/handle/2018.sw.ssu/15773
- DOI
- 10.1016/j.csda.2009.04.009
- ISSN
- 0167-9473
- Abstract
- We consider the accuracy estimation of a classifier constructed on a given training sample. The naive resubstitution estimate is known to have a downward bias problem. The traditional approach to tackling this bias problem is cross-validation. The bootstrap is another way to bring down the high variability of cross-validation. But a direct comparison of the two estimators, cross-validation and bootstrap, is not fair because the latter estimator requires much heavier computation. We performed an empirical study to compare the .632+ bootstrap estimator with the repeated 10-fold cross-validation and the repeated one-third holdout estimator. All the estimators were set to require about the same amount of computation. In the simulation study, the repeated 10-fold cross-validation estimator was found to have better performance than the .632+ bootstrap estimator when the classifier is highly adaptive to the training sample. We have also found that the .632+ bootstrap estimator suffers from a bias problem for large samples as well as for small samples. (C) 2009 Elsevier B.V. All rights reserved.
- Files in This Item
- There are no files associated with this item.
- Appears in
Collections - College of Natural Sciences > ETC > 1. Journal Articles
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.