Jg. Henikoff et S. Henikoff, USING SUBSTITUTION PROBABILITIES TO IMPROVE POSITION-SPECIFIC SCORINGMATRICES, Computer applications in the biosciences, 12(2), 1996, pp. 135-143
Each column of amino acids in a multiple alignment of protein sequence
s can be represented as a vector of 20 amino acid counts. Ebv alignmen
t and searching applications, the count vector is an imperfect represe
ntation of a position, because the observed sequences are an incomplet
e sample of the full set of related sequences. One general solution to
this problem is to model unobserved sequences by adding artificial 'p
seudo-counts' to the observed counts. We introduce a simple method for
computing pseudo-counts that combines the diversity observed in each
alignment position with amino acid substitution probabilities. In exte
nsive empirical tests, this position-based method outperformed other p
seudo-count methods and was a substantial improvement over the traditi
onal average score method used for constructing profiles.