ITA
ENG

Prognostic modelling with logistic regression analysis: a comparison of selection and estimation methods in small data sets

Authors

Steyerberg, EW Eijkemans, MJC Harrell, FE Habbema, JDF

Citation

Ew. Steyerberg et al., Prognostic modelling with logistic regression analysis: a comparison of selection and estimation methods in small data sets, STAT MED, 19(8), 2000, pp. 1059-1079

Citations number

Categorie Soggetti

General & Internal Medicine","Medical Research General Topics

Journal title

STATISTICS IN MEDICINE

ISSN journal

02776715 → ACNP

Volume

Issue

Year of publication

2000

Pages

1059 - 1079

Database

ISI

SICI code

0277-6715(20000430)19:8<1059:PMWLRA>2.0.ZU;2-J

Abstract

Logistic regression analysis may well be used to develop a prognostic model for a dichotomous outcome. Especially when limited data are available, it is difficult to determine an appropriate selection of covariables for inclu sion in such models. Also, predictions may be improved by applying some sor t of shrinkage in the estimation of regression coefficients. In this study we compare the performance of several selection and shrinkage methods in sm all data sets of patients with acute myocardial infarction, where we aim to predict 30-day mortality. Selection methods included backward stepwise sel ection with significance levels alpha of 0.01, 0.05, 0.157 (the AIC criteri on) or 0.50, and the use of qualitative external information on the sign of regression coefficients in the model. Estimation methods included standard maximum likelihood, the use of a linear shrinkage factor, penalized maximu m likelihood, the Lasso, or quantitative external information on univariabl e regression coefficients. We found that stepwise selection with a low ct ( for example, 0.05) led to a relatively poor model performance, when evaluat ed on independent data. Substantially better performance was obtained with full models with a limited number of important predictors, where regression coefficients were reduced with any of the shrinkage methods. Incorporation of external information for selection and estimation improved the stabilit y and quality of the prognostic models. We therefore recommend shrinkage me thods in full models including prespecified predictors and incorporation of external information, when prognostic models are constructed in small data sets. Copyright (C) 2000 John Wiley & Sons, Ltd.