Y. Tateno et al., RELATIVE EFFICIENCIES OF THE MAXIMUM-LIKELIHOOD, NEIGHBOR-JOINING, AND MAXIMUM-PARSIMONY METHODS WHEN SUBSTITUTION RATE VARIES WITH SITE, Molecular biology and evolution, 11(2), 1994, pp. 261-277
The relative efficiencies of the maximum-likelihood (ML), neighbor-joi
ning (NJ), and maximum-parsimony (MP) methods in obtaining the correct
topology and in estimating the branch lengths for the case of four DN
A sequences were studied by computer simulation, under the assumption
either that there is variation in substitution rate among different nu
cleotide sites or that there is no variation. For the NJ method, sever
al different distance measures (Jukes-Cantor, Kimura two-parameter, an
d gamma distances) were used, whereas for the ML method three differen
t transition/transversion ratios (R) were used. For the MP method, bot
h the standard unweighted parsimony and the dynamically weighted parsi
mony methods were used. The results obtained are as follows: (1) When
the R value is high, dynamically weighted parsimony is more efficient
than unweighted parsimony in obtaining the correct topology. (2) Howev
er, both weighted and unweighted parsimony methods are generally less
efficient than the NJ and ML methods even in the case where the MP met
hod gives a consistent tree. (3) When all the assumptions of the ML me
thod are satisfied, this method is slightly more efficient than the NJ
method. However, when the assumptions are not satisfied, the NJ metho
d with gamma distances is slightly better in obtaining the correct top
ology than is the ML method. In general, the two methods show more or
less the same performance. The NJ method may give a correct topology e
ven when the distance measures used are not unbiased estimators of nuc
leotide substitutions. (4) Branch length estimates of a tree with the
correct topology are affected more easily than topology by violation o
f the assumptions of the mathematical model used, for both the ML and
the NJ methods. Under certain conditions, branch lengths are seriously
overestimated or underestimated. The MP method often gives serious un
derestimates for certain branches. (5) Distance measures that generate
the correct topology, with high probability, do not necessarily give
good estimates of branch lengths. (6) The likelihood-ratio test and th
e confidence-limit test, in Felsenstein's DNAML, for examining the sta
tistical significance of branch length estimates are quite sensitive t
o violation of the assumptions and are generally too liberal to be use
d for actual data. Rzhetsky and Nei's branch length test is less sensi
tive to violation of the assumptions than is Felsenstein's test. (7) W
hen the extent of sequence divergence is less than or equal to 5% and
when greater than or equal to 1,000 nucleotides are used, all three me
thods show essentially the same efficiency in obtaining the correct to
pology and in estimating branch lengths. Clearly, the simplest method,
i.e., the NJ method, is preferable in this case.