Small Sample Behaviors of the Delete-*d* Cross Validation Statistic

Author(s)
Jude H. Kastens

ABSTRACT

Built upon an iterative process of resampling without replacement and out-of-sample prediction, the delete-d cross validation statistic CV(*d*) provides a robust estimate of forecast error variance. To compute CV(*d*), a dataset consisting of n observations of predictor and response values is systematically and repeatedly partitioned (split) into subsets of size *n* – *d* (used for model training) and *d* (used for model testing). Two aspects of CV(*d*) are explored in this paper. First, estimates for the unknown expected value E[CV(*d*)] are simulated in an OLS linear regression setting. Results suggest general formulas for E[CV(*d*)] dependent on σ^{2} (“true” model error variance), *n* – *d* (training set size), and *p* (number of predictors in the model). The conjectured E[CV(*d*)] formulas are connected back to theory and generalized. The formulas break down at the two largest allowable *d* values (*d* = *n* – *p* – 1 and *d* = *n* – *p*, the 1 and 0 degrees of freedom cases), and numerical instabilities are observed at these points. An explanation for this distinct behavior remains an open question. For the second analysis, simulation is used to demonstrate how the previously established asymptotic conditions {*d*/*n* → 1 and *n* – *d* → ∞ as *n* → ∞} required for optimal linear model selection using CV(*d*) for model ranking are manifested in the smallest sample setting, using either independent or correlated candidate predictors.

Built upon an iterative process of resampling without replacement and out-of-sample prediction, the delete-d cross validation statistic CV(

Cite this paper

Kastens, J. (2015) Small Sample Behaviors of the Delete-*d* Cross Validation Statistic. *Open Journal of Statistics*, **5**, 382-392. doi: 10.4236/ojs.2015.55040.

Kastens, J. (2015) Small Sample Behaviors of the Delete-

References

[1] Zhang, P. (1993) Model Selection via Multifold Cross Validation. The Annals of Statistics, 21, 299-313.

http://dx.doi.org/10.1214/aos/1176349027

[2] McQuarrie, A.D.R. and Tsai, C. (1998) Regression and Time Series Model Selection. World Scientific Publishing Co. Pte. Ltd., River Edge, NJ.

[3] Seber, G.A.F. and Lee, A.J. (2003) Linear Regression Analysis, Second Edition. John Wiley & Sons, Inc., Hoboken, NJ.

http://dx.doi.org/10.1002/9780471722199

[4] Allen, D.M. (1974) The Relationship between Variable Selection and Data Augmentation and a Method for Prediction. Technometrics, 16, 125-127.

http://dx.doi.org/10.1080/00401706.1974.10489157

[5] Stone, M. (1974) Cross-Validatory Choice and Assessment of Statistical Prediction (with Discussion). Journal of the Royal Statistical Society (Series B), 36, 111-147.

[6] Geisser, S. (1975) The Predictive Sample Reuse Method with Applications. Journal of the American Statistical Association, 70, 320-328.

http://dx.doi.org/10.1080/01621459.1975.10479865

[7] Breiman, L., Friedman, J.H., Olshen, R.A. and Stone, C.J. (1984) Classification and Regression Trees. Wadsworth, Belmont, CA.

[8] Hjorth, J.S.U. (1994) Computer Intensive Statistical Methods. Chapman & Hall/CRC, New York.

[9] Miller, A. (2002) Subset Selection in Regression. 2nd Edition, Chapman & Hall/CRC, New York.

http://dx.doi.org/10.1201/9781420035933

[10] Davison, A.C. and Hinkley, D.V. (1997) Bootstrap Methods and their Application. Cambridge University Press, New York.

http://dx.doi.org/10.1017/CBO9780511802843

[11] Shao, J. (1993) Linear Model Selection by Cross-Validation. Journal of the American Statistical Association, 88, 486-494.

http://dx.doi.org/10.1080/01621459.1993.10476299

[12] Mallows, C.L. (1973) Some Comments on Cp. Technometrics, 15, 661-675.

[13] Akaike, H. (1973) Information Theory and an Extension of the Maximum Likelihood Principle. Proceedings of 2nd International Symposium on Information Theory, Budapest, 267-281.

[14] Shao, J. (1997) An Asymptotic Theory for Linear Model Selection. Statistica Sinica, 7, 221-264.

[15] Shibata, R. (1984) Approximate Efficiency of a Selection Procedure for the Number of Regression Variables. Biometrika, 71, 43-49.

http://dx.doi.org/10.1093/biomet/71.1.43

[16] Akaike, H. (1970) Statistical Predictor Identification. Annals of the Institute of Statistical Mathematics, 22, 203-217.

http://dx.doi.org/10.1007/BF02506337

[17] Shao, J. and Tu, D. (1995) The Jackknife and Bootstrap. Springer-Verlag, Inc., New York.

http://dx.doi.org/10.1007/978-1-4612-0795-5

[18] Stein, C. (1960) Multiple Regression. In: Olkin, I., et al., Eds., Contributions to Probability and Statistics, Stanford University Press, Stanford, CA, 424-443.

[19] Bendel, R.B. (1973) Stopping Rules in Forward Stepwise-Regression. Ph.D. Dissertation, Univ. of California at Los Angeles.

[1] Zhang, P. (1993) Model Selection via Multifold Cross Validation. The Annals of Statistics, 21, 299-313.

http://dx.doi.org/10.1214/aos/1176349027

[2] McQuarrie, A.D.R. and Tsai, C. (1998) Regression and Time Series Model Selection. World Scientific Publishing Co. Pte. Ltd., River Edge, NJ.

[3] Seber, G.A.F. and Lee, A.J. (2003) Linear Regression Analysis, Second Edition. John Wiley & Sons, Inc., Hoboken, NJ.

http://dx.doi.org/10.1002/9780471722199

[4] Allen, D.M. (1974) The Relationship between Variable Selection and Data Augmentation and a Method for Prediction. Technometrics, 16, 125-127.

http://dx.doi.org/10.1080/00401706.1974.10489157

[5] Stone, M. (1974) Cross-Validatory Choice and Assessment of Statistical Prediction (with Discussion). Journal of the Royal Statistical Society (Series B), 36, 111-147.

[6] Geisser, S. (1975) The Predictive Sample Reuse Method with Applications. Journal of the American Statistical Association, 70, 320-328.

http://dx.doi.org/10.1080/01621459.1975.10479865

[7] Breiman, L., Friedman, J.H., Olshen, R.A. and Stone, C.J. (1984) Classification and Regression Trees. Wadsworth, Belmont, CA.

[8] Hjorth, J.S.U. (1994) Computer Intensive Statistical Methods. Chapman & Hall/CRC, New York.

[9] Miller, A. (2002) Subset Selection in Regression. 2nd Edition, Chapman & Hall/CRC, New York.

http://dx.doi.org/10.1201/9781420035933

[10] Davison, A.C. and Hinkley, D.V. (1997) Bootstrap Methods and their Application. Cambridge University Press, New York.

http://dx.doi.org/10.1017/CBO9780511802843

[11] Shao, J. (1993) Linear Model Selection by Cross-Validation. Journal of the American Statistical Association, 88, 486-494.

http://dx.doi.org/10.1080/01621459.1993.10476299

[12] Mallows, C.L. (1973) Some Comments on Cp. Technometrics, 15, 661-675.

[13] Akaike, H. (1973) Information Theory and an Extension of the Maximum Likelihood Principle. Proceedings of 2nd International Symposium on Information Theory, Budapest, 267-281.

[14] Shao, J. (1997) An Asymptotic Theory for Linear Model Selection. Statistica Sinica, 7, 221-264.

[15] Shibata, R. (1984) Approximate Efficiency of a Selection Procedure for the Number of Regression Variables. Biometrika, 71, 43-49.

http://dx.doi.org/10.1093/biomet/71.1.43

[16] Akaike, H. (1970) Statistical Predictor Identification. Annals of the Institute of Statistical Mathematics, 22, 203-217.

http://dx.doi.org/10.1007/BF02506337

[17] Shao, J. and Tu, D. (1995) The Jackknife and Bootstrap. Springer-Verlag, Inc., New York.

http://dx.doi.org/10.1007/978-1-4612-0795-5

[18] Stein, C. (1960) Multiple Regression. In: Olkin, I., et al., Eds., Contributions to Probability and Statistics, Stanford University Press, Stanford, CA, 424-443.

[19] Bendel, R.B. (1973) Stopping Rules in Forward Stepwise-Regression. Ph.D. Dissertation, Univ. of California at Los Angeles.