UMLS concept indexing for production databases: A feasibility study

Citation
P. Nadkarni et al., UMLS concept indexing for production databases: A feasibility study, J AM MED IN, 8(1), 2001, pp. 80-91
Citations number
36
Categorie Soggetti
Library & Information Science","General & Internal Medicine
Journal title
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION
ISSN journal
10675027 → ACNP
Volume
8
Issue
1
Year of publication
2001
Pages
80 - 91
Database
ISI
SICI code
1067-5027(200101/02)8:1<80:UCIFPD>2.0.ZU;2-9
Abstract
Objectives: To explore the feasibility of using the National Library of Med icine's Unified Medical Language System (UMLS) Metathesaurus as the basis f or a computational strategy to identify concepts in medical narrative text preparatory to indexing. To quantitatively evaluate this strategy in terms of true positives, false positives (spuriously identified concepts) and fal se negatives (concepts missed by the identification process). Methods: Using the 1999 UMLS Metathesaurus, the authors processed a trainin g set of 100 documents (50 discharge summaries, 50 surgical notes) with a c oncept-identification program, whose output was manually analyzed. They fla gged concepts that were erroneously identified and added new concepts that were not identified by the program, recording the reason for failure in suc h cases. After several refinements to both their algorithm and the UMLS sub set on which it operated, they deployed the program on a test set of 24 doc uments (12 of each kind). Results: Of 8,745 matches in the training set, 7,227 (82.6 percent) were tr ue positives, whereas of 1,701 matches in the test set, 1,298 (76.3 percent ) were true positives. Matches other than true positive indicated potential problems in production-mode concept indexing. Examples of causes of proble ms were redundant concepts in the UMLS, homonyms, acronyms, abbreviations a nd elisions, concepts that were missing from the UMLS, proper names, and sp elling errors. Conclusions: The error rate was too high for concept indexing to be the onl y production-mode means of preprocessing medical narrative. Considerable cu ration needs to be performed to define a UMLS subset that is suitable for c oncept matching.