Accurate formula for p-values of gapped local sequence and profile alignments

Authors
Citation
R. Mott, Accurate formula for p-values of gapped local sequence and profile alignments, J MOL BIOL, 300(3), 2000, pp. 649-659
Citations number
37
Categorie Soggetti
Molecular Biology & Genetics
Journal title
JOURNAL OF MOLECULAR BIOLOGY
ISSN journal
00222836 → ACNP
Volume
300
Issue
3
Year of publication
2000
Pages
649 - 659
Database
ISI
SICI code
0022-2836(20000714)300:3<649:AFFPOG>2.0.ZU;2-1
Abstract
A simple general approximation for the distribution of gapped local alignme nt scores is presented, suitable for assessing significance of comparisons between two protein sequences or a sequence and a profile. The approximatio n takes account of the scoring scheme (i.e. gap penalty and substitution ma trix or profile), sequence composition and length. Use of this formula mean s it is unnecessary to fit an extreme-value distribution to simulations or to the results of databank searches. The method is based on the theoretical ideas introduced by R. Mott and R. Tribe in 1999. Extensive simulation stu dies show that score-thresholds produced by the method are accurate to with in +/-5% 95% of the time. We also investigate factors which effect the accu racy of alignment statistics, and show that any method based on asymptotic theory is limited because asymptotic behaviour is not strictly achieved for many real protein sequences, due to extreme composition effects. Consequen tly, it may not be practicable to find a general formula that is significan tly more accurate until the sub-asymptotic behaviour of alignments is bette r understood. (C) 2000 Academic Press.