Jp. Anderson et al., Substitution model of sequence evolution for the human immunodeficiency virus type 1 subtype B gp120 gene over the C2-V5 region, J MOL EVOL, 53(1), 2001, pp. 55-62
Phylogenetic analyses frequently rely on models of sequence evolution that
detail nucleotide substitution rates, nucleotide frequencies, and site-to-s
ite rate heterogeneity. These models can influence hypothesis testing and c
an affect the accuracy of phylogenetic inferences. Maximum likelihood metho
ds of simultaneously constructing phylogenetic tree topologies and estimati
ng model parameters are computationally intensive, and are not feasible for
sample sizes of 25 or greater using personal computers. Techniques that in
itially construct a tree topology and then use this non-maximized topology
to estimate ML substitution rates, however, can quickly arrive at a model o
f sequence evolution. The accuracy of this two-step estimation technique wa
s tested using simulated data sets with known model parameters. The results
showed that for a star-like topology, as is often seen in human immunodefi
ciency virus type 1 (HIV-1) subtype B sequences, a random starting topology
could produce nucleotide substitution rates that were not statistically di
fferent than the true rates. Samples were isolated from 100 HIV-1 subtype B
infected individuals from the United States and a 620 nt region of the env
gene was sequenced for each sample. The sequence data were used to obtain
a substitution model of sequence evolution specific for HIV-1 subtype B env
by estimating nucleotide substitution rates and the site-to-site heterogen
eity in 100 individuals from the United States. The method of estimating th
e model should provide users of large data sets with a way to quickly compu
te a model of sequence evolution, while the nucleotide substitution model w
e identified should prove useful in the phylogenetic analysis of HIV-1 subt
ype B env sequences.