Mo. Meade et al., Interobserver variation in interpreting chest radiographs for the diagnosis of acute respiratory distress syndrome, AM J R CRIT, 161(1), 2000, pp. 85-90
To measure the reliability of chest radiographic diagnosis of acute respira
tory distress syndrome (ARDS) we conducted an observer agreement study in w
hich two of eight intensivists and a radiologist, blinded to one another's
interpretation, reviewed 778 radiographs from 99 critically ill patients. O
ne intensivist and a radiologist participated in pilot training. Raters mad
e a global rating of the presence of ARDS on the basis of diffuse bilateral
infiltrates. We assessed interobserver agreement in a pairwise fashion. Fo
r rater pairings in which one rater had not participated in the consensus p
rocess we found moderate levels of raw (0.68 to 0.80), chance-corrected (ka
ppa 0.38 to 0.55), and chance-independent (Phi 0.53 to 0.75) agreement. The
pair of raters who participated in consensus training achieved excellent t
o almost perfect raw (0.88 to 0.94), chance-corrected (kappa 0.72 to 0.88),
and chance-independent (Phi 0.74 to 0.89) agreement. We conclude that inte
nsivists without formal consensus training can achieve moderate levels of a
greement. Consensus training is necessary to achieve the substantial or alm
ost perfect levels of agreement optimal for the conduct of clinical trials.