ITA
ENG

Compression and approximate matching

Authors

Allison, L Powell, D Dix, TI

Citation

L. Allison et al., Compression and approximate matching, COMPUTER J, 42(1), 1999, pp. 1-10

Citations number

Categorie Soggetti

Computer Science & Engineering

Journal title

COMPUTER JOURNAL

ISSN journal

00104620 → ACNP

Volume

Issue

Year of publication

1999

Pages

1 - 10

Database

ISI

SICI code

0010-4620(1999)42:1<1:CAAM>2.0.ZU;2-R

Abstract

A population of sequences is called non-random if there is a statistical mo del and an associated compression algorithm that allows members of the popu lation to be compressed, on average. Any available statistical model of a p opulation should be incorporated into algorithms for alignment of the seque nces and doing so changes the rank order of possible alignments in general. The model should also be used in deciding if a resulting approximate match between two sequences is significant or not. It is shown how to do this fo r two plausible interpretations involving pairs of sequences that might or might not be related. Efficient alignment algorithms are described for quit e general statistical models of sequences. The new alignment algorithms are more sensitive to what might be termed 'features' of the sequences. A natu ral significance test is shown to be rarely fooled by apparent similarities between two sequences that are merely typical of all or most members of th e population, even unrelated members.