A HIDDEN MARKOV MODEL APPROACH TO VARIATION AMONG SITES IN RATE OF EVOLUTION

Citation
J. Felsenstein et Ga. Churchill, A HIDDEN MARKOV MODEL APPROACH TO VARIATION AMONG SITES IN RATE OF EVOLUTION, Molecular biology and evolution, 13(1), 1996, pp. 93-104
Citations number
29
Categorie Soggetti
Biology
ISSN journal
07374038
Volume
13
Issue
1
Year of publication
1996
Pages
93 - 104
Database
ISI
SICI code
0737-4038(1996)13:1<93:AHMMAT>2.0.ZU;2-8
Abstract
The method of Hidden Markov Models is used to allow for unequal and un known evolutionary rates at different sites in molecular sequences. Ra tes of evolution at different sites are assumed to be drawn from a set of possible rates, with a finite number of possibilities. The overall likelihood of phylogeny is calculated as a sum of terms, each term be ing the probability of the data given a particular assignment of rates to sites, times the prior probability of that particular combination of rates. The probabilities of different rate combinations are specifi ed by a stationary Markov chain that assigns rate categories to sites. While there will be a very large number of possible ways of assigning rates to sites, a simple recursive algorithm allows the contributions to the likelihood from all possible combinations of rates to be summe d, in a time proportional to the number of different rates at a single site. Thus with three rates, the effort involved is no greater than t hree times that for a single rate. This ''Hidden Markov Model'' method allows for rates to differ between sites and for correlations between the rates of neighboring sites. By summing over all possibilities it does not require us to know the rates at individual sites. However, it does not allow for correlation of rates at nonadjacent sites, nor doe s it allow for a continuous distribution of rates over sites. It is sh own how to use the Newton-Raphson method to estimate branch lengths of a phylogeny and to infer from a phylogeny what assignment of rates to sites has the largest posterior probability. An example is given usin g beta-hemoglobin DNA sequences in eight mammal species; the regions o f high and low evolutionary rates are inferred and also the average le ngth of patches of similar rates.