From gene trees to species trees

Citation
B. Ma et al., From gene trees to species trees, SIAM J COMP, 30(3), 2000, pp. 729-752
Citations number
34
Categorie Soggetti
Computer Science & Engineering
Journal title
SIAM JOURNAL ON COMPUTING
ISSN journal
00975397 → ACNP
Volume
30
Issue
3
Year of publication
2000
Pages
729 - 752
Database
ISI
SICI code
0097-5397(20000824)30:3<729:FGTTST>2.0.ZU;2-M
Abstract
This paper studies various algorithmic issues in reconstructing a species t ree from gene trees under the duplication and the mutation cost model. This is a fundamental problem in computational molecular biology. Our main resu lts are as follows. 1. A linear time algorithm is presented for computing all the losses in dup lications associated with the least common ancestor mapping from a gene tre e to a species tree. This answers a problem raised recently by Eulenstein, Mirkin, and Vingron [J. Comput. Bio., 5 ( 1998), pp. 135-148]. 2. The complexity of finding an optimal species tree from gene trees is stu died. The problem is proved to be NP-hard for the duplication cost and for the mutation cost. Further, the concept of reconciled trees was introduced by Goodman et al. and formalized by Page for visualizing the relationship b etween gene and species trees. We show that constructing an optimal reconci led tree for gene trees is also NP-hard. Finally, we consider a general rec onstruction problem and show it to be NP-hard even for the well-known neare st neighbor interchange distance. 3. A new and efficiently computable metric is defined based on the duplicat ion cost. We show that the problem of finding an optimal species tree from gene trees is NP-hard under this new metric but it can be approximated with in factor 2 in polynomial time. Using this approximation result, we propose a heuristic method for finding a species tree from gene trees with uniquel y labeled leaves under the duplication cost. Our experimental tests demonst rate that when the number of species is larger than 15 and gene trees are c lose to each other, our heuristic method is significantly better than the e xisting program in Page's GeneTree 1.0 that starts the search from a random tree.