Fast exact string pattern-matching algorithms adapted to the characteristics of the medical language

Authors
Citation
C. Lovis et Rh. Baud, Fast exact string pattern-matching algorithms adapted to the characteristics of the medical language, J AM MED IN, 7(4), 2000, pp. 378-391
Citations number
25
Categorie Soggetti
Library & Information Science","General & Internal Medicine
Journal title
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION
ISSN journal
10675027 → ACNP
Volume
7
Issue
4
Year of publication
2000
Pages
378 - 391
Database
ISI
SICI code
1067-5027(200007/08)7:4<378:FESPAA>2.0.ZU;2-Y
Abstract
Objective: The authors consider the problem of exact string pattern matchin g using algorithms that do not require any preprocessing. To choose the mos t appropriate algorithm, distinctive features of the medical language must be taken into account. The characteristics of medical language are emphasiz ed in this regard, the best algorithm of those reviewed is proposed, and de tailed evaluations of time complexity for processing medical texts are prov ided. Design: The authors first illustrate and discuss the techniques of various string pattern-matching algorithms. Next, the source code and the behavior of representative exact string pattern-matching algorithms are presented in a comprehensive manner to promote their implementation. Detailed explanati ons of the use of various techniques to improve performance are given. Measurements: Real-time measures of time complexity with English medical te xts are presented. They lead to results distinct from those found in the co mputer science literature, which are typically computed with normally distr ibuted texts. Results: The Boyer-Moore-Horspool algorithm achieves the best overall resul ts when used with medical texts. This algorithm usually performs at least t wice as fast as the other algorithms tested. Conclusion: The time performance of exact string pattern matching can be gr eatly improved ii: an efficient algorithm is used. Considering the growing amount of text handled in the electronic patient record, it is worth implem enting this efficient algorithm.