Internal validation of predictive models: Efficiency of some procedures for logistic regression analysis

Citation
Ew. Steyerberg et al., Internal validation of predictive models: Efficiency of some procedures for logistic regression analysis, J CLIN EPID, 54(8), 2001, pp. 774-781
Citations number
33
Categorie Soggetti
Envirnomentale Medicine & Public Health","Medical Research General Topics
Journal title
JOURNAL OF CLINICAL EPIDEMIOLOGY
ISSN journal
08954356 → ACNP
Volume
54
Issue
8
Year of publication
2001
Pages
774 - 781
Database
ISI
SICI code
0895-4356(200108)54:8<774:IVOPME>2.0.ZU;2-Q
Abstract
The performance of a predictive model is overestimated when simply determin ed on the sample of subjects that was used to construct the model. Several internal validation methods are available that aim to provide a more accura te estimate of model performance in new subjects. We evaluated several vari ants of split-sample, cross-validation and bootstrapping methods with a log istic regression model that included eight predictors for 30-day mortality after an acute myocardial infarction. Random samples with a size between,n = 572 and n = 9165 were drawn from a large data set (GUSTO-I; n = 40,830; 2 851 deaths) to reflect modeling in data sets with between 5 and 80 events p er variable. Independent performance was determined on the remaining subjec ts. Performance measures included discriminative ability, calibration and o verall accuracy. We found that split-sample analyses gave overly pessimisti c estimates of performance with large variability. Cross-validation on 10% of the sample had low bias and low variability, but was not suitable fur al l performance measures. Internal validity could best be estimated with boot strapping, which provided stable estimates with low bias. We conclude that split-sample validation is inefficient, and recommend bootstrapping for est imation of internal validity of a predictive logistic regression model. (C) 2001 Elsevier Science Inc. All rights reserved.