Codon-substitution models for heterogeneous selection pressure at amino acid sites

Citation
Zh. Yang et al., Codon-substitution models for heterogeneous selection pressure at amino acid sites, GENETICS, 155(1), 2000, pp. 431-449
Citations number
43
Categorie Soggetti
Biology,"Molecular Biology & Genetics
Journal title
GENETICS
ISSN journal
00166731 → ACNP
Volume
155
Issue
1
Year of publication
2000
Pages
431 - 449
Database
ISI
SICI code
0016-6731(200005)155:1<431:CMFHSP>2.0.ZU;2-R
Abstract
Comparison of relative fixation rates of synonymous (silent) and nonsynonym ous (amino acid-altering) mutations provides a means for understanding the mechanisms of molecular sequence evolution. The nonsynonymous/synonymous ra te ratio (omega = d(N)/d(S)) is an important indicator of selective pressur e at the protein level, with omega = 1 meaning neutral mutations, omega < 1 purifying selection, and omega > 1 diversifying positive selection. Amino acid sites in a protein are expected to be under different selective pressu res and have different underlying omega ratios. We develop models that acco unt for heterogeneous omega ratios among amino acid sites and apply them to phylogenetic analyses of protein-coding DNA sequences. These models are us eful for testing for adaptive molecular evolution and identifying amino aci d sites under diversifying selection. Ten data sets of genes from nuclear, mitochondrial, and viral genomes are analyzed to estimate the distributions of omega among sites. In all data sets analyzed, the selective pressure in dicated by the omega ratio is found to be highly heterogeneous among sites. Previously unsuspected Darwinian selection is detected in several genes in which the average omega ratio across sites is <1, but in which some sites are clearly under diversifying selection with omega > 1. Genes undergoing p ositive selection include the beta-globin gene from vertebrates, mitochondr ial protein-coding genes from hominoids, the hemagglutinin (HA) gene from h uman influenza virus A, and HIV-1 env, vif, and pol genes. Tests for the pr esence of positively selected sites and their subsequent identification app ear quite robust to the specific distributional form assumed for omega and call be;achieved using any of several models we implement. However, we enco untered difficulties in estimating the precise distribution of omega among sites from real data sets.