Ew. Steyerberg et al., Prognostic modeling with logistic regression analysis: In search of a sensible strategy in small data sets, MED DECIS M, 21(1), 2001, pp. 45-56
Clinical decision making often requires estimates of the likelihood of a di
chotomous outcome in individual patents. When empirical data are available,
these estimates may well be obtained from a logistic regression model. Sev
eral strategies may be followed in the development of such a model. In this
study, the authors compare alternative strategies in 23 small subsamples f
rom a large data set of patients with an acute myocardial infarction, where
they developed predictive models for 30-day mortality. Evaluations were pe
rformed in an independent part of the data set. Specifically, the authors s
tudied the effect of coding of covariables and stepwise selection on discri
minative ability of the resulting model, and the effect of statistical "shr
inkage" techniques on calibration. As expected, dichotomization of continuo
us covariables implied a loss of information. Remarkably, stepwise selectio
n resulted in less discriminating models compared to full models including
all available covariables, even when more than half of these were randomly
associated with the outcome. Using qualitative information on the sign of t
he effect of predictors slightly improved the predictive ability. Calibrati
on improved when shrinkage was applied on the standard maximum likelihood e
stimates of the regression coefficients. In conclusion, a sensible strategy
in small data sets is to apply shrinkage methods in full models that inclu
de well-coded predictors that are selected based on external information.