ITA
ENG

UMLS concept indexing for production databases: A feasibility study

Authors

Nadkarni, P Chen, R Brandt, C

Citation

P. Nadkarni et al., UMLS concept indexing for production databases: A feasibility study, J AM MED IN, 8(1), 2001, pp. 80-91

Citations number

Categorie Soggetti

Library & Information Science","General & Internal Medicine

Journal title

JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION

ISSN journal

10675027 → ACNP

Volume

Issue

Year of publication

2001

Pages

80 - 91

Database

ISI

SICI code

1067-5027(200101/02)8:1<80:UCIFPD>2.0.ZU;2-9

Abstract

Objectives: To explore the feasibility of using the National Library of Med icine's Unified Medical Language System (UMLS) Metathesaurus as the basis f or a computational strategy to identify concepts in medical narrative text preparatory to indexing. To quantitatively evaluate this strategy in terms of true positives, false positives (spuriously identified concepts) and fal se negatives (concepts missed by the identification process). Methods: Using the 1999 UMLS Metathesaurus, the authors processed a trainin g set of 100 documents (50 discharge summaries, 50 surgical notes) with a c oncept-identification program, whose output was manually analyzed. They fla gged concepts that were erroneously identified and added new concepts that were not identified by the program, recording the reason for failure in suc h cases. After several refinements to both their algorithm and the UMLS sub set on which it operated, they deployed the program on a test set of 24 doc uments (12 of each kind). Results: Of 8,745 matches in the training set, 7,227 (82.6 percent) were tr ue positives, whereas of 1,701 matches in the test set, 1,298 (76.3 percent ) were true positives. Matches other than true positive indicated potential problems in production-mode concept indexing. Examples of causes of proble ms were redundant concepts in the UMLS, homonyms, acronyms, abbreviations a nd elisions, concepts that were missing from the UMLS, proper names, and sp elling errors. Conclusions: The error rate was too high for concept indexing to be the onl y production-mode means of preprocessing medical narrative. Considerable cu ration needs to be performed to define a UMLS subset that is suitable for c oncept matching.