PRESS-RELATED STATISTICS - REGRESSION TOOLS FOR CROSS-VALIDATION AND CASE DIAGNOSTICS

Citation
Db. Holiday et al., PRESS-RELATED STATISTICS - REGRESSION TOOLS FOR CROSS-VALIDATION AND CASE DIAGNOSTICS, Medicine and science in sports and exercise, 27(4), 1995, pp. 612-620
Citations number
29
Categorie Soggetti
Sport Sciences
ISSN journal
01959131
Volume
27
Issue
4
Year of publication
1995
Pages
612 - 620
Database
ISI
SICI code
0195-9131(1995)27:4<612:PS-RTF>2.0.ZU;2-R
Abstract
In the health science literature, a common approach of validating a re gression equation is data-splitting, where a portion of the data fits the model (fitting sample) and the remainder (validation sample) estim ates future performance. The R(2) and SEE obtained by predicting the v alidation sample with the fitting sample equation is a proper estimate of future performance, tending to correct for the natural upward bias of the R(2) and SEE obtained from fitting sample alone. Data-splittin g has several disadvantages, however. These include: 1) difficulty, ar bitrariness, and inconvenience of matching samples; 2) the need to rep ort two sets of statistics to determine homogeneity; and 3) the lack o f equation stability due to diluted sample size. The PRESS statistic a nd associated residuals do not require the data to be split, yield alt ernative unbiased estimates of R(2) and SEE, and provide useful case d iagnostics. This procedure is easy to use, is widely available in mode rn statistical packages, but is rarely utilized. The two methods are c ontrasted here using a simulation from original data for predicting bo dy density from anthropometric measurements of a group of 117 women. T he PRESS approach is particularly appropriate for smaller datasets; me thods of reporting these statistics are recommended.