USING SUBSTITUTION PROBABILITIES TO IMPROVE POSITION-SPECIFIC SCORINGMATRICES

Citation
Jg. Henikoff et S. Henikoff, USING SUBSTITUTION PROBABILITIES TO IMPROVE POSITION-SPECIFIC SCORINGMATRICES, Computer applications in the biosciences, 12(2), 1996, pp. 135-143
Citations number
33
Categorie Soggetti
Mathematical Methods, Biology & Medicine","Computer Sciences, Special Topics","Computer Science Interdisciplinary Applications","Biology Miscellaneous
ISSN journal
02667061
Volume
12
Issue
2
Year of publication
1996
Pages
135 - 143
Database
ISI
SICI code
0266-7061(1996)12:2<135:USPTIP>2.0.ZU;2-6
Abstract
Each column of amino acids in a multiple alignment of protein sequence s can be represented as a vector of 20 amino acid counts. Ebv alignmen t and searching applications, the count vector is an imperfect represe ntation of a position, because the observed sequences are an incomplet e sample of the full set of related sequences. One general solution to this problem is to model unobserved sequences by adding artificial 'p seudo-counts' to the observed counts. We introduce a simple method for computing pseudo-counts that combines the diversity observed in each alignment position with amino acid substitution probabilities. In exte nsive empirical tests, this position-based method outperformed other p seudo-count methods and was a substantial improvement over the traditi onal average score method used for constructing profiles.