Dd. Pollock et al., Coevolving protein residues: Maximum likelihood identification and relationship to structure, J MOL BIOL, 287(1), 1999, pp. 187-198
The identification of protein sites undergoing correlated evolution (coevol
ution) is of great interest due to the possibility that these pairs will te
nd to be adjacent in the three-dimensional structure. Identification of suc
h pairs should provide useful information for understanding the evolutionar
y process, predicting the effects of site-directed substitution, and potent
ially for predicting protein structure. Here, we develop and apply a maximu
m likelihood method with the aim of improving detection of coevolution. Unl
ike previous methods which have had limited success, this method allows for
correlations induced by phylogenetic relationships and for variation in ra
te of evolution along branches, and does not rely on accurate reconstructio
n of ancestral nodes. Tn order to reduce the complexity of coevolutionary r
elationships and identify the primary component of pairwise coevolution bet
ween two sites, we reduce the data to a two-state system;at each site, rega
rdless of the actual number of residues observed at that site. Simulations
show that this strategy is good at identifying simple correlations and at r
ecognizing cases in which the data are insufficient to distinguish between
coevolution and spurious correlations. The new method was tested by using s
ize and charge characteristics to group the residues at each site, and then
evaluating coevolution in myoglobin sequences. Grouping based on physicoch
emical characteristics allows categorization of coevolving sites into posit
ive and negative coevolution, depending on the correlation between equilibr
ium state frequencies. We detected a striking excess of negative coevolutio
n (corresponding to charge) at sites brought into proximity by the periodic
ity of the alpha-helix, and there was also a tendency for sites with signif
icant likelihood ratios to be close in the three-dimensional structure. Sit
es on the surface of the protein appear to coevolve both when they are clos
e in the structure, and when they are distant, implying a role for folding
and/or avoidance of quaternary structure in the coevolution process. (C) 19
99 Academic Press.