ITA
ENG

SUBLINEAR APPROXIMATE STRING-MATCHING AND BIOLOGICAL APPLICATIONS

Authors

CHANG WI LAWLER EL

Citation

Wi. Chang et El. Lawler, SUBLINEAR APPROXIMATE STRING-MATCHING AND BIOLOGICAL APPLICATIONS, Algorithmica, 12(4-5), 1994, pp. 327-344

Citations number

Categorie Soggetti

Computer Sciences",Mathematics,Mathematics,"Computer Science Software Graphycs Programming

Journal title

Algorithmica → ACNP

ISSN journal

01784617

Volume

Issue

4-5

Year of publication

1994

Pages

327 - 344

Database

ISI

SICI code

0178-4617(1994)12:4-5<327:SASABA>2.0.ZU;2-T

Abstract

Given a text string of length n and a pattern string of length m over a b-letter alphabet, the k differences approximate string matching pro blem asks for all locations in the text where the pattern occurs with at most k differences (substitutions, insertions, deletions). We treat k not as a constant but as a fraction of m (not necessarily constant- fraction). Previous algorithms require at least 0(kn) time (or exponen tial space). We give an algorithm that is sublinear time O((n/m)k log( b) m) when the text is random and k is bounded by the threshold m/(log (b) m + 0(1)). In particular, when k = o(m/log(b) m) the expected runn ing time is o(n). In the worst case our algorithm is 0(kn), but is sti ll an improvement in that it is practical and uses 0(m) space compared with 0(n) or 0(m2). We define three problems motivated by molecular b iology and describe efficient algorithms based on our techniques: (1) approximate substring matching, (2) approximate-overlap detection, and (3) approximate codon matching. Respectively, applications to biology are local similarity search, sequence assembly, and DNA-protein match ing.