ITA
ENG

From gene trees to species trees

Authors

Ma, B Li, M Zhang, LX

Citation

B. Ma et al., From gene trees to species trees, SIAM J COMP, 30(3), 2000, pp. 729-752

Citations number

Categorie Soggetti

Computer Science & Engineering

Journal title

SIAM JOURNAL ON COMPUTING

ISSN journal

00975397 → ACNP

Volume

Issue

Year of publication

2000

Pages

729 - 752

Database

ISI

SICI code

0097-5397(20000824)30:3<729:FGTTST>2.0.ZU;2-M

Abstract

This paper studies various algorithmic issues in reconstructing a species t ree from gene trees under the duplication and the mutation cost model. This is a fundamental problem in computational molecular biology. Our main resu lts are as follows. 1. A linear time algorithm is presented for computing all the losses in dup lications associated with the least common ancestor mapping from a gene tre e to a species tree. This answers a problem raised recently by Eulenstein, Mirkin, and Vingron [J. Comput. Bio., 5 ( 1998), pp. 135-148]. 2. The complexity of finding an optimal species tree from gene trees is stu died. The problem is proved to be NP-hard for the duplication cost and for the mutation cost. Further, the concept of reconciled trees was introduced by Goodman et al. and formalized by Page for visualizing the relationship b etween gene and species trees. We show that constructing an optimal reconci led tree for gene trees is also NP-hard. Finally, we consider a general rec onstruction problem and show it to be NP-hard even for the well-known neare st neighbor interchange distance. 3. A new and efficiently computable metric is defined based on the duplicat ion cost. We show that the problem of finding an optimal species tree from gene trees is NP-hard under this new metric but it can be approximated with in factor 2 in polynomial time. Using this approximation result, we propose a heuristic method for finding a species tree from gene trees with uniquel y labeled leaves under the duplication cost. Our experimental tests demonst rate that when the number of species is larger than 15 and gene trees are c lose to each other, our heuristic method is significantly better than the e xisting program in Page's GeneTree 1.0 that starts the search from a random tree.