Rl. Tatusov et al., DETECTION OF CONSERVED SEGMENTS IN PROTEINS - ITERATIVE SCANNING OF SEQUENCE DATABASES WITH ALIGNMENT BLOCKS, Proceedings of the National Academy of Sciences of the United Statesof America, 91(25), 1994, pp. 12091-12095
We describe an approach to analyzing protein sequence databases that,
starting from a single uncharacterized sequence or group of related se
quences, generates blocks of conserved segments. The procedure involve
s iterative database scans with an evolving position-dependent weight
matrix constructed from a coevolving set of aligned conserved segments
. For each iteration, the expected distribution of matrix scores under
a random model is used to set a cutoff score for the inclusion of a s
egment in the next iteration. This cutoff may be calculated to allow t
he chance inclusion of either a fixed number or a fixed proportion of
false positive segments. With sufficiently high cutoff scores, the pro
cedure converged for all alignment blocks studied, with varying number
s of iterations required. Different methods for calculating weight mat
rices from alignment blocks were compared. The most effective of those
tested was a logarithm-of-odds, Bayesian-based approach that used pri
or residue probabilities calculated from a mixture of Dirichlet distri
butions. The procedure described was used to detect novel conserved mo
ths of potential biological importance.