Bm. Livingston et al., Assessment of the performance of five intensive cave scoring models withina large Scottish database, CRIT CARE M, 28(6), 2000, pp. 1820-1827
Objective: To assess and compare the performance of five severity of illnes
s scoring systems used commonly for intensive care unit (ICU) patients in t
he United Kingdom. The five models analyzed were versions II and III of the
Acute Physiology and Chronic Health Evaluation (APACHE) system, a version
of APACHE II using United Kingdom (UK)-derived coefficients (UK APACHE II),
version II of the Simplified Acute Physiology Score (SAPS), and version II
of the Mortality Probability Model, computed at admission (MPM0) and after
24 hrs in the ICU (MPM24).
Design: A 2-yr prospective cohort study of consecutive admissions to intens
ive care units.
Setting: A total of 22 general ICUs in Scotland
Patients: A total of 13,291 admissions to the study, which after prospectiv
ely agreed exclusions left a total of 10,393 patients for the analysis.
Outcome measures: Death or survival at hospital discharge.
Measurements and Main Results: All the models showed reasonable discriminat
ion using the area under the receiver operating characteristic curve (APACH
E ill, 0.845; APACHE II, 0.805; UKAPACHE II, 0.809; SAPS II, 0.843; MPM0, 0
.785; MPM24, 0.799). The levels of observed mortality were significantly di
fferent than that predicted by all models, using the Hosmer-Lemeshow goodne
ss-of-fit C test (p < .001), with the results of the test being confirmed b
y calibration curves.
When excluding patients discharged in the first 24 hrs to allow for compari
sons using the same patient group, APACHE III, MPM24, and SAPS II (APACHE I
II, 0.795; MPM24, 0.791; SAPS II, 0.784) showed significantly better discri
mination than APACHE II, UK APACHE II, and MPM0 (APACHE II, 0.763; UK APACH
E II, 0.756; MPM0 0.741). However, calibration changed little for all model
s with observed mortality still significantly different from that predicted
by the scoring systems (p < .001). For equivalent data sets, APACHE II dem
onstrated superior calibration to all the models using the chi-squared valu
e from the Hosmer-Lemeshow test for both populations (APACHE III, 366; APAC
HE II, 67; UKAPACHE II, 237; SAPS II, 142; MPM0, 452; MPM24, 101).
Conclusions: SAPS II demonstrated the best overall performance, but the sup
erior calibration of APACHE II makes it the most appropriate model for comp
arisons of mortality rates in different ICUs. The significance of the Hosme
r-Lemeshow C test in all the models suggest that new logistic regression co
efficients should be generated and the systems retested before they could b
e used with confidence in Scottish ICUs.