WHEN IS IT SAFE TO USE AN OVERSIMPLIFIED SUBSTITUTION MODEL IN TREE-MAKING

Citation
A. Rzhetsky et T. Sitnikova, WHEN IS IT SAFE TO USE AN OVERSIMPLIFIED SUBSTITUTION MODEL IN TREE-MAKING, Molecular biology and evolution, 13(9), 1996, pp. 1255-1265
Citations number
56
Categorie Soggetti
Biology
ISSN journal
07374038
Volume
13
Issue
9
Year of publication
1996
Pages
1255 - 1265
Database
ISI
SICI code
0737-4038(1996)13:9<1255:WIISTU>2.0.ZU;2-2
Abstract
The choice of an ''optimal'' mathematical model for computing evolutio nary distances from real sequences is not currently supported by easy- to-use software applicable to large data sets, and an investigator fre quently selects one of the simplest models available. Here we study pr operties of the observed proportion of differences (p-distance) betwee n sequences as an estimator of evolutionary distance for tree-making. We show that p-distances allow for consistent tree-making with any of the popular methods working with evolutionary distances if evolution o f sequences obeys a ''molecular clock'' (more precisely, if it follows a stationary time-reversible Markov model of nucleotide substitution) . Next, we show that p-distances seem to be efficient in recovering th e correct tree topology under a ''molecular clock,'' but produce ''sta tistically supported'' wrong trees when substitution rates vary among evolutionary Lineages. Finally, we outline a practical approach for se lecting an ''optimal'' model of nucleotide substitution in a real data analysis, and obtain a crude estimate of a ''prior'' distribution of the expected tree branch lengths under the Jukes-Cantor model. We conc lude that the use of a model that is obviously oversimplified is inadv isable unless it is justified by a preliminary analysis of the real se quences.