Physicians are often asked to make prognostic assessments but often worry t
hat their assessments will prove inaccurate. Prognostic systems were develo
ped to enhance the accuracy of such assessments. This paper describes an ap
proach for evaluating prognostic systems based on the accuracy (calibration
and discrimination) and generalizability (reproducibility and transportabi
lity) of the system's predictions. Reproducibility is the ability to produc
e accurate predictions among patients not included in the development of th
e system but from the same population. Transportability is the ability to p
roduce accurate predictions among patients drawn from a different but plaus
ibly related population. On the basis of the observation that the generaliz
ability of a prognostic system is commonly limited to a single historical p
eriod, geographic location, methodologic approach, disease spectrum, or fol
low-up interval, we describe a working hierarchy of the cumulative generali
zability of prognostic systems.
This approach is illustrated in a structured review of the Dukes and Jass s
taging systems for colon and rectal cancer and applied to a young man with
colon cancer. Because it treats the development of the system as a "black b
ox" and evaluates only the performance of the predictions, the approach can
be applied to any system that generates predicted probabilities. Although
the Dukes and Jass staging systems are discrete, the approach can also be a
pplied to systems that generate continuous predictions and, with some modif
ication, to systems that predict over multiple time periods. Like any scien
tific hypothesis, the generalizability of a prognostic system is establishe
d by being tested and being found accurate across increasingly diverse sett
ings. The more numerous and diverse the settings in which the system is tes
ted and found accurate, the more likely it will generalize to an untested s
etting.