Sequence alignment: an approximation law for the Z-value with applicationsto databank scanning

Citation
Jn. Bacro et Jp. Comet, Sequence alignment: an approximation law for the Z-value with applicationsto databank scanning, COMPUT CHEM, 25(4), 2001, pp. 401-410
Citations number
29
Categorie Soggetti
Chemistry
Journal title
COMPUTERS & CHEMISTRY
ISSN journal
00978485 → ACNP
Volume
25
Issue
4
Year of publication
2001
Pages
401 - 410
Database
ISI
SICI code
0097-8485(200107)25:4<401:SAAALF>2.0.ZU;2-R
Abstract
The Z-value is an attempt to estimate the statistical significance of a Smi th and Waterman dynamic programming alignment score (H-score) through the u se of a Monte-Carlo procedure. In this paper, we give an approximation for the Z-value law deduced from the Poisson clumping heuristic developed by Wa terman and Vingron (Stat. Sci. 9 (1994) 367) in the case of independent and identically distributed sequences comparison. As for non-gapped alignment scores, our approximation is of Gumbel type but with parameters that are se quence independent. This result makes clear the related experimental result s mentioned by Comet et al. (Comput. Chem. 23 (1999) 317). Using 'quasi-rea l' sequences (i.e. randomly shuffled sequences of the same length and amino acid composition as the real ones) we investigate the relevance of our app roximation result. Since the Monte-Carlo approach we use generates a bias f or the Gumbel decay parameter estimation, a correction procedure is propose d. Applications to real sequences are considered and we show how our result s can be used to detect the potential biological relationships between real sequences. (C) 2001 Elsevier Science Ltd. All rights reserved.