Analytical molecular distance estimates can be inaccurate and biased e
stimates of the total number of substitutions not only when the model
of evolution they are based on is incorrect, but also when the method
of estimating the total is too simple. This comes about because when t
here are different types of substitutions occurring simultaneously, it
can become extremely difficult to estimate the number of the more qui
ckly evolving type, and the variance of this larger number can overwhe
lm the total estimate. In this paper, in an extension of earlier work
with a simple two-parameter model of evolution, more accurate analytic
al distances are derived for models appropriate to a variety of known
DNA types using generalized least squares principles of noise reductio
n. It is shown that the new estimates can be applied to achieve more a
ccurate results for site-to-site rate variation, regions with biased n
ucleotide frequencies, and synonymous sites in protein-coding regions.
This study also includes a methodology to obtain accurate distance es
timates for large numbers of sequence regions evolving in different ma
nners. (C) 1998 Academic Press.