FAST RETRIEVAL OF ELECTRONIC MESSAGES THAT CONTAIN MISTYPED WORDS OR SPELLING-ERRORS

Authors
Citation
Jtl. Wang et Cy. Chang, FAST RETRIEVAL OF ELECTRONIC MESSAGES THAT CONTAIN MISTYPED WORDS OR SPELLING-ERRORS, IEEE transactions on systems, man and cybernetics. Part B. Cybernetics, 27(3), 1997, pp. 441-451
Citations number
39
Categorie Soggetti
Controlo Theory & Cybernetics","Computer Science Cybernetics","Robotics & Automatic Control
ISSN journal
10834419
Volume
27
Issue
3
Year of publication
1997
Pages
441 - 451
Database
ISI
SICI code
1083-4419(1997)27:3<441:FROEMT>2.0.ZU;2-R
Abstract
This paper presents an index structure for retrieving electronic messa ges that contain mistyped words or spelling errors, Given a query stri ng (e.g., a search key), we want to find those messages that approxima tely contain the query, i.e., certain inserts, deletes and mismatches are allowed when matching the query with a word (or phrase) in the mes sages. Our approach is to store the messages sequentially in a databas e and hash their ''fingerprints'' into a number of ''fingerprint files .'' When the query is given, its fingerprints are also hashed into the files and a histogram of votes is constructed on the messages, We der ive a lower bound, based on which one can prune a large number of nonq ualifying messages (i.e., those whose votes are below the lower bound) during searching. The paper presents some experimental results, which demonstrate the effectiveness of the index structure and the lower bo und.