This paper studies various algorithmic issues in reconstructing a species t
ree from gene trees under the duplication and the mutation cost model. This
is a fundamental problem in computational molecular biology. Our main resu
lts are as follows.
1. A linear time algorithm is presented for computing all the losses in dup
lications associated with the least common ancestor mapping from a gene tre
e to a species tree. This answers a problem raised recently by Eulenstein,
Mirkin, and Vingron [J. Comput. Bio., 5 ( 1998), pp. 135-148].
2. The complexity of finding an optimal species tree from gene trees is stu
died. The problem is proved to be NP-hard for the duplication cost and for
the mutation cost. Further, the concept of reconciled trees was introduced
by Goodman et al. and formalized by Page for visualizing the relationship b
etween gene and species trees. We show that constructing an optimal reconci
led tree for gene trees is also NP-hard. Finally, we consider a general rec
onstruction problem and show it to be NP-hard even for the well-known neare
st neighbor interchange distance.
3. A new and efficiently computable metric is defined based on the duplicat
ion cost. We show that the problem of finding an optimal species tree from
gene trees is NP-hard under this new metric but it can be approximated with
in factor 2 in polynomial time. Using this approximation result, we propose
a heuristic method for finding a species tree from gene trees with uniquel
y labeled leaves under the duplication cost. Our experimental tests demonst
rate that when the number of species is larger than 15 and gene trees are c
lose to each other, our heuristic method is significantly better than the e
xisting program in Page's GeneTree 1.0 that starts the search from a random
tree.