Constructing and updating prognostic models that learn from training cases
is a time-consuming task. The more compact, and yet informative, the traini
ng sets a re, the faster one can build and properly evaluate such models. W
e have compared different regression diagnostic methods for selection and r
emoval of training cases in prognostic models. Univariate determinations we
re performed using classical regression diagnostic statistics. Multivariate
determinations were performed using (1) a sequential "backward" selection
of cases, and (2) a non-sequential genetic algorithm. The genetic algorithm
produced final models that kept few cases and retained predictive capabili
ty. A genetic algorithm approach to case selection may be better suited for
guiding removal of cases in training sets than a univariate or a sequentia
l multivariate approach, possibly because of its ability to defect sets of
cases that are influential en bloc but may not be sufficiently influential
when considered in isolation.