Background. Computer-based diagnostic systems are available commercial
ly, but there has been limited evaluation of their performance. We ass
essed the diagnostic capabilities of four internal medicine diagnostic
systems: Dxplain, Iliad, Meditel, and QMR. Methods. Ten expert clinic
ians created a set of 105 diagnostically challenging clinical case sum
maries involving actual patients. Clinical data were entered into each
program with the vocabulary provided by the program's developer. Each
of the systems produced a ranked list of possible diagnoses for each
patient, as did the group of experts. We calculated scores on several
performance measures for each computer program. Results. No single com
puter program scored better than the others on all performance measure
s. Among all cases and all programs, the proportion of correct diagnos
es ranged from 0.52 to 0.71, and the mean proportion of relevant diagn
oses ranged from 0.19 to 0.37. On average, less than half the diagnose
s on the experts' original list of reasonable diagnoses were suggested
by any of the programs. However, each program suggested an average of
approximately two additional diagnoses per case that the experts foun
d relevant but had not originally considered. Conclusions. The results
provide a profile of the strengths and limitations of these computer
programs. The programs should be used by physicians who can identify a
nd use the relevant information and ignore the irrelevant information
that can be produced.