Ja. Lake, OPTIMALLY RECOVERING RATE VARIATION INFORMATION FROM GENOMES AND SEQUENCES - PATTERN FILTERING, Molecular biology and evolution, 15(9), 1998, pp. 1224-1231
Nucleotide substitution rates vary at different positions within genes
and genomes, but rates are difficult to estimate, because they are ma
sked by the stochastic nature of substitutions. In this paper, a linea
r method, pattern filtering, is described which can optimally separate
the signals (related to substitution rates or to other measures of se
quence change) from stochastic noise. Pattern filtering promises to be
useful in both genomic and molecular evolution studies. In an example
using mitochondrial genomes, it is shown that pattern filtering can r
eveal coding and noncoding regions without the need for prior identifi
cation of reading frames or other knowledge of the sequence and promis
es to be an important tool for genomic analysis. In a second example,
it is shown that pattern filtering allows one to classify sites on the
basis of an estimator of substitution rates. Using elongation factor
EF-1 alpha sequences, it is shown that the fastest sites favor archaea
as the sister taxon of eukaryotes, whereas the slower sites support t
he eocyte prokaryotes as the sister taxon of eukaryotes, suggesting th
at the former result is an artifact of ''long branch attraction.''