A method for computing the likelihood of a set of sequences assuming a phyl
ogenetic network as an evolutionary hypothesis is presented. The approach a
pplies directed graphical models to sequence evolution on networks and is a
natural generalization of earlier work by Felsenstein on evolutionary tree
s, including it as a special case. The likelihood computation involves seve
ral steps. First, the phylogenetic network is rooted to form a directed acy
clic graph (DAG). Then, applying standard models for nucleotide/amino acid
substitution, the DAG is converted into a Bayesian network from which the j
oint probability distribution involving all nodes of the network can be dir
ectly read. The joint probability is explicitly dependent on branch lengths
and on recombination parameters (prior probability of a parent sequence).
The likelihood of the data assuming no knowledge of hidden nodes is obtaine
d by marginalization, i.e., by summing over all combinations of unknown sta
tes. As the number of terms increases exponentially with the number of hidd
en nodes, a Markov chain Monte Carlo procedure (Gibbs sampling) is used to
accurately approximate the likelihood by summing over the most important st
ates only. Investigating a human T-cell lymphotropic virus (HTLV) data set
and optimizing both branch lengths and recombination parameters, we find th
at the likelihood of a corresponding phylogenetic network outperforms a set
of competing evolutionary trees. In general, except for the case of a tree
, the likelihood of a network will be dependent on the choice of the root,
even if a reversible model of substitution is applied. Thus, the method als
o provides a way in which to root a phylogenetic network by choosing a node
that produces a most likely network.