Likelihood analysis of phylogenetic networks using directed graphical models

Citation
K. Strimmer et V. Moulton, Likelihood analysis of phylogenetic networks using directed graphical models, MOL BIOL EV, 17(6), 2000, pp. 875-881
Citations number
39
Categorie Soggetti
Biology,"Experimental Biology
Journal title
MOLECULAR BIOLOGY AND EVOLUTION
ISSN journal
07374038 → ACNP
Volume
17
Issue
6
Year of publication
2000
Pages
875 - 881
Database
ISI
SICI code
0737-4038(200006)17:6<875:LAOPNU>2.0.ZU;2-S
Abstract
A method for computing the likelihood of a set of sequences assuming a phyl ogenetic network as an evolutionary hypothesis is presented. The approach a pplies directed graphical models to sequence evolution on networks and is a natural generalization of earlier work by Felsenstein on evolutionary tree s, including it as a special case. The likelihood computation involves seve ral steps. First, the phylogenetic network is rooted to form a directed acy clic graph (DAG). Then, applying standard models for nucleotide/amino acid substitution, the DAG is converted into a Bayesian network from which the j oint probability distribution involving all nodes of the network can be dir ectly read. The joint probability is explicitly dependent on branch lengths and on recombination parameters (prior probability of a parent sequence). The likelihood of the data assuming no knowledge of hidden nodes is obtaine d by marginalization, i.e., by summing over all combinations of unknown sta tes. As the number of terms increases exponentially with the number of hidd en nodes, a Markov chain Monte Carlo procedure (Gibbs sampling) is used to accurately approximate the likelihood by summing over the most important st ates only. Investigating a human T-cell lymphotropic virus (HTLV) data set and optimizing both branch lengths and recombination parameters, we find th at the likelihood of a corresponding phylogenetic network outperforms a set of competing evolutionary trees. In general, except for the case of a tree , the likelihood of a network will be dependent on the choice of the root, even if a reversible model of substitution is applied. Thus, the method als o provides a way in which to root a phylogenetic network by choosing a node that produces a most likely network.