PHYLOGENETIC ANALYSIS USING PARSIMONY AND LIKELIHOOD METHODS

Authors
Citation
Zh. Yang, PHYLOGENETIC ANALYSIS USING PARSIMONY AND LIKELIHOOD METHODS, Journal of molecular evolution, 42(2), 1996, pp. 294-307
Citations number
69
Categorie Soggetti
Genetics & Heredity",Biology
ISSN journal
00222844
Volume
42
Issue
2
Year of publication
1996
Pages
294 - 307
Database
ISI
SICI code
0022-2844(1996)42:2<294:PAUPAL>2.0.ZU;2-N
Abstract
The assumptions underlying the maximum-parsimony (MP) method of phylog enetic tree reconstruction were intuitively examined by studying the w ay the method works. Computer simulations were performed to corroborat e the intuitive examination. Parsimony appears to involve very stringe nt assumptions concerning the process of sequence evolution, such as c onstancy of substitution rates between nucleotides, constancy of rates across nucleotide sites, and equal branch lengths in the tree. For pr actical data analysis, the requirement of equal branch lengths means s imilar substitution rates among lineages (the existence of an approxim ate molecular clock), relatively long interior branches, and also few species in the data. However, a small amount of evolution is neither a necessary nor a sufficient requirement of the method. The difficultie s involved in the application of current statistical estimation theory to tree reconstruction were discussed, and it was suggested that the approach proposed by Felsenstein (1981, J. Mol. Evol. 17: 368-376) for topology estimation, as well as its many variations and extensions, d iffers fundamentally from the maximum likelihood estimation of a conve ntional statistical parameter. Evidence was presented showing that the Felsenstein approach does not share the asymptotic efficiency of the maximum likelihood estimator of a statistical parameter. Computer simu lations were performed to study the probability that MP recovers the t rue tree under a hierarchy of models of nucleotide substitution; its p erformance relative to the likelihood method was especially noted. The results appeared to support the intuitive examination of the assumpti ons underlying MP. When a simple model of nucleotide substitution was assumed to generate data, the probability that MP recovers the true to pology could be as high as, or even higher than, that for the likeliho od method. When the assumed model became more complex and realistic, e .g., when substitution rates were allowed to differ between nucleotide s or across sites, the probability that MP recovers the true topology, and especially its performance relative to that of the likelihood met hod, generally deteriorates. As the complexity of the process of nucle otide substitution in real sequences is well recognized, the likelihoo d method appears preferable to parsimony. However, the development of a statistical methodology for the efficient estimation of the tree top ology remains a difficult open problem.