Ew. Steyerberg et al., Prognostic modelling with logistic regression analysis: a comparison of selection and estimation methods in small data sets, STAT MED, 19(8), 2000, pp. 1059-1079
Citations number
50
Categorie Soggetti
General & Internal Medicine","Medical Research General Topics
Logistic regression analysis may well be used to develop a prognostic model
for a dichotomous outcome. Especially when limited data are available, it
is difficult to determine an appropriate selection of covariables for inclu
sion in such models. Also, predictions may be improved by applying some sor
t of shrinkage in the estimation of regression coefficients. In this study
we compare the performance of several selection and shrinkage methods in sm
all data sets of patients with acute myocardial infarction, where we aim to
predict 30-day mortality. Selection methods included backward stepwise sel
ection with significance levels alpha of 0.01, 0.05, 0.157 (the AIC criteri
on) or 0.50, and the use of qualitative external information on the sign of
regression coefficients in the model. Estimation methods included standard
maximum likelihood, the use of a linear shrinkage factor, penalized maximu
m likelihood, the Lasso, or quantitative external information on univariabl
e regression coefficients. We found that stepwise selection with a low ct (
for example, 0.05) led to a relatively poor model performance, when evaluat
ed on independent data. Substantially better performance was obtained with
full models with a limited number of important predictors, where regression
coefficients were reduced with any of the shrinkage methods. Incorporation
of external information for selection and estimation improved the stabilit
y and quality of the prognostic models. We therefore recommend shrinkage me
thods in full models including prespecified predictors and incorporation of
external information, when prognostic models are constructed in small data
sets. Copyright (C) 2000 John Wiley & Sons, Ltd.