PATTERN-RECOGNITION OF STRINGS WITH SUBSTITUTIONS, INSERTIONS, DELETIONS AND GENERALIZED TRANSPOSITION

Citation
Bj. Oommen et Rks. Loke, PATTERN-RECOGNITION OF STRINGS WITH SUBSTITUTIONS, INSERTIONS, DELETIONS AND GENERALIZED TRANSPOSITION, Pattern recognition, 30(5), 1997, pp. 789-800
Citations number
23
Categorie Soggetti
Computer Sciences, Special Topics","Engineering, Eletrical & Electronic","Computer Science Artificial Intelligence
Journal title
ISSN journal
00313203
Volume
30
Issue
5
Year of publication
1997
Pages
789 - 800
Database
ISI
SICI code
0031-3203(1997)30:5<789:POSWSI>2.0.ZU;2-0
Abstract
We study the problem of recognizing a string Y which is the noisy vers ion of some unknown string X chosen from a finite dictionary, II. The traditional case which has been extensively studied in the literature is the one in which Y contains substitution, insertion and deletion ( SID) errors. Although some work has been done to extend the traditiona l set of edit operations to include the straightforward transposition of adjacent characters(2) [see J. Assoc. Comput. Mach. 22, 177-183 (19 75)] the problem is unsolved when the transposed characters are themse lves subsequently substituted, as is typical in cursive and typewritte n script, in molecular biology and in noisy chain-coded boundaries. In this paper we present the first reported solution to the analytic pro blem of editing one string X to another, Y, using these four edit oper ations. A scheme for obtaining the optimal edit operations has also be en given. Both these solutions are optimal for the infinite alphabet c ase. Using these algorithms we present a syntactic pattern recognition scheme which corrects noisy text containing all these types of errors . The paper includes experimental results involving sub-dictionaries o f the most common English words which demonstrate the superiority of o ur system over existing methods. (C) 1997 Pattern Recognition Society.