Jn. Bacro et Jp. Comet, Sequence alignment: an approximation law for the Z-value with applicationsto databank scanning, COMPUT CHEM, 25(4), 2001, pp. 401-410
The Z-value is an attempt to estimate the statistical significance of a Smi
th and Waterman dynamic programming alignment score (H-score) through the u
se of a Monte-Carlo procedure. In this paper, we give an approximation for
the Z-value law deduced from the Poisson clumping heuristic developed by Wa
terman and Vingron (Stat. Sci. 9 (1994) 367) in the case of independent and
identically distributed sequences comparison. As for non-gapped alignment
scores, our approximation is of Gumbel type but with parameters that are se
quence independent. This result makes clear the related experimental result
s mentioned by Comet et al. (Comput. Chem. 23 (1999) 317). Using 'quasi-rea
l' sequences (i.e. randomly shuffled sequences of the same length and amino
acid composition as the real ones) we investigate the relevance of our app
roximation result. Since the Monte-Carlo approach we use generates a bias f
or the Gumbel decay parameter estimation, a correction procedure is propose
d. Applications to real sequences are considered and we show how our result
s can be used to detect the potential biological relationships between real
sequences. (C) 2001 Elsevier Science Ltd. All rights reserved.