Correlations among amino acid sites in bHLH protein domains: An information theoretic analysis

Citation
Wr. Atchley et al., Correlations among amino acid sites in bHLH protein domains: An information theoretic analysis, MOL BIOL EV, 17(1), 2000, pp. 164-178
Citations number
42
Categorie Soggetti
Biology,"Experimental Biology
Journal title
MOLECULAR BIOLOGY AND EVOLUTION
ISSN journal
07374038 → ACNP
Volume
17
Issue
1
Year of publication
2000
Pages
164 - 178
Database
ISI
SICI code
0737-4038(200001)17:1<164:CAAASI>2.0.ZU;2-A
Abstract
An information theoretic approach is used to examine the magnitude and orig in of associations among amino acid sites in the basic helix-loop-helix (bH LH) family of transcription factors. Entropy and mutual information values are used to summarize the variability and covariability of amino acids comp rising the bHLH domain for 242 sequences. When these quantitative measures are integrated with crystal structure data and summarized using helical whe els, they provide important insights into the evolution of three-dimensiona l structure in these proteins. We show that amino acid sites in the bHLH do main known to pack against each other have very low entropy values, indicat ing little residue diversity at these contact sites. Noncontact sites, on t he other hand, exhibit significantly larger entropy values, as well as stat istically significant levels of mutual information or association among sit es. High levels of mutual information indicate significant amounts of inter correlation among amino acid residues at these various sites. Using compute r simulations based on a parametric bootstrap procedure, we are able to par tition the observed covariation among various amino acid sites into that ar ising from phylogenetic (common ancestry) and stochastic causes and those r esulting from structural and functional constraints. These results show tha t a significant amount of the observed covariation among amino acid sites i s due to structural/functional constraints, over and above the covariation arising from phylogenetic constraints. These quantitative analyses provide a highly integrated evolutionary picture of the multidimensional dynamics o f sequence diversity and protein structure.