Gf. Cooper et Ra. Miller, AN EXPERIMENT COMPARING LEXICAL AND STATISTICAL-METHODS FOR EXTRACTING MESH TERMS FROM CLINICAL FREE-TEXT, Journal of the American Medical Informatics Association, 5(1), 1998, pp. 62-75
Citations number
22
Categorie Soggetti
Information Science & Library Science","Computer Science Interdisciplinary Applications","Medical Informatics
Objective: A primary goal of the University of Pittsburgh's 1930-94 UM
LS-sponsored effort was to develop and evaluate PostDoc (a lexical ind
exing system) and Finder (a statistical indexing system) comparatively
, and then in combination as a hybrid system. Each system takes as inp
ut a portion of the free text from a narrative part of a patient's ele
ctronic medical record and returns a list of suggested MeSH terms to u
se in formulating a Medline search that includes concepts in the text.
This paper describes the systems and reports an evaluation. The inten
t is for this evaluation to serve as a step toward the eventual realiz
ation of systems that assist healthcare personnel in using the electro
nic medical record to construct patient-specific searches of Medline.
Design: The authors tested the performances of PostDoc, Finder, and a
hybrid system, using text taken from randomly selected clinical record
s, which were stratified to include six radiology reports, six patholo
gy reports, and six discharge summaries. They identified concepts in t
he clinical records that might conceivably be used in performing a pat
ient-specific Medline search. Each system was given the free text of e
ach record as an input. The extent to which a system-derived list of M
eSH terms captured the relevant concepts in these documents was determ
ined based on blinded assessments by the authors. Results: PostDoc out
put a mean of approximately 19 MeSH terms per report, which included a
bout 40% of the relevant report concepts. Finder output a mean of appr
oximately 57 terms per report and captured about 45% of the relevant r
eport concepts. A hybrid system captured approximately 66% of the rele
vant concepts and output about 71 terms per report. Conclusion: The ou
tputs of PostDoc and Finder are complementary in capturing MeSH terms
from clinical free text. The results suggest possible approaches to re
duce the number of terms output while maintaining the percentage of te
rms captured, including the use of UMLS semantic types to constrain th
e output list to contain only clinically relevant MeSH terms.