An efficient hash-based algorithm for sequence data searching

Citation
Kw. Chu et al., An efficient hash-based algorithm for sequence data searching, COMPUTER J, 41(6), 1998, pp. 402-415
Citations number
31
Categorie Soggetti
Computer Science & Engineering
Journal title
COMPUTER JOURNAL
ISSN journal
00104620 → ACNP
Volume
41
Issue
6
Year of publication
1998
Pages
402 - 415
Database
ISI
SICI code
0010-4620(1998)41:6<402:AEHAFS>2.0.ZU;2-F
Abstract
In real life, data collected day by day often appear in sequences and this type of data is called sequence data. The technique of searching for simila r patterns among sequence data is very important in many applications. We f irst point out that there are some deficiencies in the existing definitions of sequence similarity. We then introduce a definition of sequence similar ity based on the shape of sequences. The definition is also extended to han dle sequence matching with linear scaling in both amplitude and time dimens ions. A fast sequence searching algorithm based on extendable hashing is al so proposed, The algorithm can match linearly scaled sequences and guarante e that no qualified data subsequence is falsely rejected. Several experimen ts are performed on real data (stock price movement) and synthetic data to measure the performance of the algorithm in different aspects.