G. Hripcsak et al., UNLOCKING CLINICAL-DATA FROM NARRATIVE REPORTS - A STUDY OF NATURAL-LANGUAGE PROCESSING, Annals of internal medicine, 122(9), 1995, pp. 681-688
Objective: To evaluate the automated detection of clinical conditions
described in narrative reports. Design: Automated methods and human ex
perts detected the presence or absence of six clinical conditions in 2
00 admission chest radiograph reports. Study Subjects: A computerized,
general-purpose natural language processor; 6 internists; 6 radiologi
sts; 6 lay persons; and 3 other computer methods. Main Outcome Measure
s: Intersubject disagreement was quantified by ''distance'' (the avera
ge number of clinical conditions per report on which two subjects disa
greed) and by sensitivity and specificity with respect to the physicia
ns. Results: Using a majority vote, physicians detected 101 conditions
in the 200 reports (0.51 per report); the most common condition was a
cute bacterial pneumonia (prevalence, 0.14), and the least common was
chronic obstructive pulmonary disease (prevalence, 0.03). Pairs of phy
sicians disagreed on the presence of at least 1 condition for an avera
ge of 20% of reports. The average intersubject distance among physicia
ns was 0.24 (95% CI, 0.19 to 0.29) out of a maximum possible distance
of 6. No physician had a significantly greater distance than the avera
ge. The average distance of the natural language processor from the ph
ysicians was 0.26 (CI, 0.21 to 0.32; not significantly greater than th
e average among physicians). Lay persons and alternative computer meth
ods had significantly greater distance from the physicians (all >0.5).
The natural language processor had a sensitivity of 81% (CI, 73% to 8
7%) and a specificity of 98% (CI, 97% to 99%); physicians had an avera
ge sensitivity of 85% and an average specificity of 98%. Conclusions:
Physicians disagreed on the interpretation of narrative reports, but t
his was not caused by outlier physicians or a consistent difference in
the way internists and radiologists read reports. The natural languag
e processor was not distinguishable from the physicians and was superi
or to all other comparison subjects. Although the domain of this study
was restricted (six clinical conditions in chest radiographs), natura
l language processing seems to have the potential to extract clinical
information from narrative reports in a manner that will support autom
ated decision-support and clinical research.