MULTIPLE FILTRATION AND APPROXIMATE PATTERN-MATCHING

Citation
Pa. Pevzner et Ms. Waterman, MULTIPLE FILTRATION AND APPROXIMATE PATTERN-MATCHING, Algorithmica, 13(1-2), 1995, pp. 135-154
Citations number
39
Categorie Soggetti
Computer Sciences",Mathematics,Mathematics,"Computer Science Software Graphycs Programming
Journal title
ISSN journal
01784617
Volume
13
Issue
1-2
Year of publication
1995
Pages
135 - 154
Database
ISI
SICI code
0178-4617(1995)13:1-2<135:MFAAP>2.0.ZU;2-4
Abstract
Given a text of length n and a query of length q, we present an algori thm for finding all locations of m-tuples in the text and in the query that differ by at most k mismatches. This problem is motivated by the dot-matrix constructions for sequence comparison and optimal oligonuc leotide probe selection routinely used in molecular biology. In the ca se q = m the problem coincides with the classical approximate string m atching with k mismatches problem. We present a new approach to this p roblem based on multiple hashing, which may have advantages over some sophisticated and theoretically efficient methods that have been propo sed. This paper describes a two-stage process. The first stage (multip le filtration) uses a new technique to preselect roughly similar m-tup les. The second stage compares these In-tuples using an accurate metho d. We demonstrate the advantages of multiple filtration in comparison with other techniques for approximate pattern matching.