RECURSIVE HASHING FUNCTIONS FOR N-GRAMS

Authors
Citation
Jd. Cohen, RECURSIVE HASHING FUNCTIONS FOR N-GRAMS, ACM transactions on information systems, 15(3), 1997, pp. 291-320
Citations number
54
Categorie Soggetti
Information Science & Library Science","Computer Science Information Systems
ISSN journal
10468188
Volume
15
Issue
3
Year of publication
1997
Pages
291 - 320
Database
ISI
SICI code
1046-8188(1997)15:3<291:RHFFN>2.0.ZU;2-E
Abstract
Many indexing, retrieval, and comparison methods are based on counting or cataloguing n-grams in streams of symbols. The fastest method of i mplementing such operations is through the use of hash tables. Rapid h ashing of consecutive n-grams is best done using a recursive hash func tion, in which the hash value of the current n-gram is derived from th e hash value of its predecessor. This article generalizes recursive ha sh functions found in the literature and proposes new methods offering superior performance. Experimental results demonstrate substantial sp eed improvement over conventional approaches, while retaining near-ide al hash value distribution.