Ms. Waterman et M. Vingron, RAPID AND ACCURATE ESTIMATES OF STATISTICAL SIGNIFICANCE FOR SEQUENCEDATA-BASE SEARCHES, Proceedings of the National Academy of Sciences of the United Statesof America, 91(11), 1994, pp. 4625-4628
A central question in sequence comparison is the statistical significa
nce of an observed Similarity. For local alignment containing gaps to
optimize sequence similarity this problem has so far not been solved m
athematically. Using as a basis the Chen-Stein theory of Poisson appro
ximation, we present a practical method to approximate the probability
that a local alignment score is a result of chance alone. For a set o
f similarity scores and gap penalties only one simulation of random al
ignments needs to be calculated to derive the key information allowing
us to estimate the significance of any alignment calculated under thi
s setting. We present applications to data base searching and the anal
ysis of pairwise and self-comparisons of proteins.