ITA
ENG

A simple algorithm to infer gene duplication and speciation events on a gene tree

Authors

Zmasek, CM Eddy, SR

Citation

Cm. Zmasek et Sr. Eddy, A simple algorithm to infer gene duplication and speciation events on a gene tree, BIOINFORMAT, 17(9), 2001, pp. 821-828

Citations number

Categorie Soggetti

Multidisciplinary

Journal title

BIOINFORMATICS

ISSN journal

13674803 → ACNP

Volume

Issue

Year of publication

2001

Pages

821 - 828

Database

ISI

SICI code

1367-4803(200109)17:9<821:ASATIG>2.0.ZU;2-F

Abstract

Motivation: When analyzing protein sequences using sequence similarity sear ches, orthologous sequences (that diverged by speciation) are more reliable predictors of a new protein's function than paralogous sequences (that div erged by gene duplication), because duplication enables functional diversif ication. The utility of phylogenetic information in high-throughput genome annotation ('phylogenomics') is widely recognized, but existing approaches are either manual or indirect (e.g. not based on phylogenetic trees). Our g oal is to automate phylogenomics using explicit phylogenetic inference. A n ecessary component is an algorithm to infer speciation and duplication even ts in a given gene tree. Results: We give an algorithm to infer speciation and duplication events on a gene tree by comparison to a trusted species tree. This algorithm has a worst-case running time of O(n(2)) which is inferior to two previous algori thms that are similar toO(n) for a gene tree of n sequences. However, our a lgorithm is extremely simple, and its asymptotic worst case behavior is onl y realized on pathological data sets. We show empirically, using 1750 gene trees constructed from the Pfam protein family database, that it appears to be a practical (and often superior) algorithm for analyzing real gene tree s.