ITA
ENG

AN EXPERIMENT COMPARING LEXICAL AND STATISTICAL-METHODS FOR EXTRACTING MESH TERMS FROM CLINICAL FREE-TEXT

Authors

COOPER GF MILLER RA

Citation

Gf. Cooper et Ra. Miller, AN EXPERIMENT COMPARING LEXICAL AND STATISTICAL-METHODS FOR EXTRACTING MESH TERMS FROM CLINICAL FREE-TEXT, Journal of the American Medical Informatics Association, 5(1), 1998, pp. 62-75

Citations number

Categorie Soggetti

Information Science & Library Science","Computer Science Interdisciplinary Applications","Medical Informatics

Journal title

Journal of the American Medical Informatics Association → ACNP

ISSN journal

10675027

Volume

Issue

Year of publication

1998

Pages

62 - 75

Database

ISI

SICI code

1067-5027(1998)5:1<62:AECLAS>2.0.ZU;2-I

Abstract

Objective: A primary goal of the University of Pittsburgh's 1930-94 UM LS-sponsored effort was to develop and evaluate PostDoc (a lexical ind exing system) and Finder (a statistical indexing system) comparatively , and then in combination as a hybrid system. Each system takes as inp ut a portion of the free text from a narrative part of a patient's ele ctronic medical record and returns a list of suggested MeSH terms to u se in formulating a Medline search that includes concepts in the text. This paper describes the systems and reports an evaluation. The inten t is for this evaluation to serve as a step toward the eventual realiz ation of systems that assist healthcare personnel in using the electro nic medical record to construct patient-specific searches of Medline. Design: The authors tested the performances of PostDoc, Finder, and a hybrid system, using text taken from randomly selected clinical record s, which were stratified to include six radiology reports, six patholo gy reports, and six discharge summaries. They identified concepts in t he clinical records that might conceivably be used in performing a pat ient-specific Medline search. Each system was given the free text of e ach record as an input. The extent to which a system-derived list of M eSH terms captured the relevant concepts in these documents was determ ined based on blinded assessments by the authors. Results: PostDoc out put a mean of approximately 19 MeSH terms per report, which included a bout 40% of the relevant report concepts. Finder output a mean of appr oximately 57 terms per report and captured about 45% of the relevant r eport concepts. A hybrid system captured approximately 66% of the rele vant concepts and output about 71 terms per report. Conclusion: The ou tputs of PostDoc and Finder are complementary in capturing MeSH terms from clinical free text. The results suggest possible approaches to re duce the number of terms output while maintaining the percentage of te rms captured, including the use of UMLS semantic types to constrain th e output list to contain only clinically relevant MeSH terms.