Prognostic modelling with logistic regression analysis: a comparison of selection and estimation methods in small data sets

Citation
Ew. Steyerberg et al., Prognostic modelling with logistic regression analysis: a comparison of selection and estimation methods in small data sets, STAT MED, 19(8), 2000, pp. 1059-1079
Citations number
50
Categorie Soggetti
General & Internal Medicine","Medical Research General Topics
Journal title
STATISTICS IN MEDICINE
ISSN journal
02776715 → ACNP
Volume
19
Issue
8
Year of publication
2000
Pages
1059 - 1079
Database
ISI
SICI code
0277-6715(20000430)19:8<1059:PMWLRA>2.0.ZU;2-J
Abstract
Logistic regression analysis may well be used to develop a prognostic model for a dichotomous outcome. Especially when limited data are available, it is difficult to determine an appropriate selection of covariables for inclu sion in such models. Also, predictions may be improved by applying some sor t of shrinkage in the estimation of regression coefficients. In this study we compare the performance of several selection and shrinkage methods in sm all data sets of patients with acute myocardial infarction, where we aim to predict 30-day mortality. Selection methods included backward stepwise sel ection with significance levels alpha of 0.01, 0.05, 0.157 (the AIC criteri on) or 0.50, and the use of qualitative external information on the sign of regression coefficients in the model. Estimation methods included standard maximum likelihood, the use of a linear shrinkage factor, penalized maximu m likelihood, the Lasso, or quantitative external information on univariabl e regression coefficients. We found that stepwise selection with a low ct ( for example, 0.05) led to a relatively poor model performance, when evaluat ed on independent data. Substantially better performance was obtained with full models with a limited number of important predictors, where regression coefficients were reduced with any of the shrinkage methods. Incorporation of external information for selection and estimation improved the stabilit y and quality of the prognostic models. We therefore recommend shrinkage me thods in full models including prespecified predictors and incorporation of external information, when prognostic models are constructed in small data sets. Copyright (C) 2000 John Wiley & Sons, Ltd.