A simple algorithm to infer gene duplication and speciation events on a gene tree

Citation
Cm. Zmasek et Sr. Eddy, A simple algorithm to infer gene duplication and speciation events on a gene tree, BIOINFORMAT, 17(9), 2001, pp. 821-828
Citations number
37
Categorie Soggetti
Multidisciplinary
Journal title
BIOINFORMATICS
ISSN journal
13674803 → ACNP
Volume
17
Issue
9
Year of publication
2001
Pages
821 - 828
Database
ISI
SICI code
1367-4803(200109)17:9<821:ASATIG>2.0.ZU;2-F
Abstract
Motivation: When analyzing protein sequences using sequence similarity sear ches, orthologous sequences (that diverged by speciation) are more reliable predictors of a new protein's function than paralogous sequences (that div erged by gene duplication), because duplication enables functional diversif ication. The utility of phylogenetic information in high-throughput genome annotation ('phylogenomics') is widely recognized, but existing approaches are either manual or indirect (e.g. not based on phylogenetic trees). Our g oal is to automate phylogenomics using explicit phylogenetic inference. A n ecessary component is an algorithm to infer speciation and duplication even ts in a given gene tree. Results: We give an algorithm to infer speciation and duplication events on a gene tree by comparison to a trusted species tree. This algorithm has a worst-case running time of O(n(2)) which is inferior to two previous algori thms that are similar toO(n) for a gene tree of n sequences. However, our a lgorithm is extremely simple, and its asymptotic worst case behavior is onl y realized on pathological data sets. We show empirically, using 1750 gene trees constructed from the Pfam protein family database, that it appears to be a practical (and often superior) algorithm for analyzing real gene tree s.