RELATIVE EFFICIENCIES OF THE MAXIMUM-LIKELIHOOD, NEIGHBOR-JOINING, AND MAXIMUM-PARSIMONY METHODS WHEN SUBSTITUTION RATE VARIES WITH SITE

Citation
Y. Tateno et al., RELATIVE EFFICIENCIES OF THE MAXIMUM-LIKELIHOOD, NEIGHBOR-JOINING, AND MAXIMUM-PARSIMONY METHODS WHEN SUBSTITUTION RATE VARIES WITH SITE, Molecular biology and evolution, 11(2), 1994, pp. 261-277
Citations number
39
Categorie Soggetti
Biology
ISSN journal
07374038
Volume
11
Issue
2
Year of publication
1994
Pages
261 - 277
Database
ISI
SICI code
0737-4038(1994)11:2<261:REOTMN>2.0.ZU;2-E
Abstract
The relative efficiencies of the maximum-likelihood (ML), neighbor-joi ning (NJ), and maximum-parsimony (MP) methods in obtaining the correct topology and in estimating the branch lengths for the case of four DN A sequences were studied by computer simulation, under the assumption either that there is variation in substitution rate among different nu cleotide sites or that there is no variation. For the NJ method, sever al different distance measures (Jukes-Cantor, Kimura two-parameter, an d gamma distances) were used, whereas for the ML method three differen t transition/transversion ratios (R) were used. For the MP method, bot h the standard unweighted parsimony and the dynamically weighted parsi mony methods were used. The results obtained are as follows: (1) When the R value is high, dynamically weighted parsimony is more efficient than unweighted parsimony in obtaining the correct topology. (2) Howev er, both weighted and unweighted parsimony methods are generally less efficient than the NJ and ML methods even in the case where the MP met hod gives a consistent tree. (3) When all the assumptions of the ML me thod are satisfied, this method is slightly more efficient than the NJ method. However, when the assumptions are not satisfied, the NJ metho d with gamma distances is slightly better in obtaining the correct top ology than is the ML method. In general, the two methods show more or less the same performance. The NJ method may give a correct topology e ven when the distance measures used are not unbiased estimators of nuc leotide substitutions. (4) Branch length estimates of a tree with the correct topology are affected more easily than topology by violation o f the assumptions of the mathematical model used, for both the ML and the NJ methods. Under certain conditions, branch lengths are seriously overestimated or underestimated. The MP method often gives serious un derestimates for certain branches. (5) Distance measures that generate the correct topology, with high probability, do not necessarily give good estimates of branch lengths. (6) The likelihood-ratio test and th e confidence-limit test, in Felsenstein's DNAML, for examining the sta tistical significance of branch length estimates are quite sensitive t o violation of the assumptions and are generally too liberal to be use d for actual data. Rzhetsky and Nei's branch length test is less sensi tive to violation of the assumptions than is Felsenstein's test. (7) W hen the extent of sequence divergence is less than or equal to 5% and when greater than or equal to 1,000 nucleotides are used, all three me thods show essentially the same efficiency in obtaining the correct to pology and in estimating branch lengths. Clearly, the simplest method, i.e., the NJ method, is preferable in this case.