ITA
ENG

An efficient hash-based algorithm for sequence data searching

Authors

Chu, KW Lam, SK Wong, MH

Citation

Kw. Chu et al., An efficient hash-based algorithm for sequence data searching, COMPUTER J, 41(6), 1998, pp. 402-415

Citations number

Categorie Soggetti

Computer Science & Engineering

Journal title

COMPUTER JOURNAL

ISSN journal

00104620 → ACNP

Volume

Issue

Year of publication

1998

Pages

402 - 415

Database

ISI

SICI code

0010-4620(1998)41:6<402:AEHAFS>2.0.ZU;2-F

Abstract

In real life, data collected day by day often appear in sequences and this type of data is called sequence data. The technique of searching for simila r patterns among sequence data is very important in many applications. We f irst point out that there are some deficiencies in the existing definitions of sequence similarity. We then introduce a definition of sequence similar ity based on the shape of sequences. The definition is also extended to han dle sequence matching with linear scaling in both amplitude and time dimens ions. A fast sequence searching algorithm based on extendable hashing is al so proposed, The algorithm can match linearly scaled sequences and guarante e that no qualified data subsequence is falsely rejected. Several experimen ts are performed on real data (stock price movement) and synthetic data to measure the performance of the algorithm in different aspects.