Ew. Steyerberg et al., Internal validation of predictive models: Efficiency of some procedures for logistic regression analysis, J CLIN EPID, 54(8), 2001, pp. 774-781
Citations number
33
Categorie Soggetti
Envirnomentale Medicine & Public Health","Medical Research General Topics
The performance of a predictive model is overestimated when simply determin
ed on the sample of subjects that was used to construct the model. Several
internal validation methods are available that aim to provide a more accura
te estimate of model performance in new subjects. We evaluated several vari
ants of split-sample, cross-validation and bootstrapping methods with a log
istic regression model that included eight predictors for 30-day mortality
after an acute myocardial infarction. Random samples with a size between,n
= 572 and n = 9165 were drawn from a large data set (GUSTO-I; n = 40,830; 2
851 deaths) to reflect modeling in data sets with between 5 and 80 events p
er variable. Independent performance was determined on the remaining subjec
ts. Performance measures included discriminative ability, calibration and o
verall accuracy. We found that split-sample analyses gave overly pessimisti
c estimates of performance with large variability. Cross-validation on 10%
of the sample had low bias and low variability, but was not suitable fur al
l performance measures. Internal validity could best be estimated with boot
strapping, which provided stable estimates with low bias. We conclude that
split-sample validation is inefficient, and recommend bootstrapping for est
imation of internal validity of a predictive logistic regression model. (C)
2001 Elsevier Science Inc. All rights reserved.