OPTIMALLY RECOVERING RATE VARIATION INFORMATION FROM GENOMES AND SEQUENCES - PATTERN FILTERING

Authors
Citation
Ja. Lake, OPTIMALLY RECOVERING RATE VARIATION INFORMATION FROM GENOMES AND SEQUENCES - PATTERN FILTERING, Molecular biology and evolution, 15(9), 1998, pp. 1224-1231
Citations number
28
Categorie Soggetti
Biology Miscellaneous",Biology,"Genetics & Heredity
ISSN journal
07374038
Volume
15
Issue
9
Year of publication
1998
Pages
1224 - 1231
Database
ISI
SICI code
0737-4038(1998)15:9<1224:ORRVIF>2.0.ZU;2-N
Abstract
Nucleotide substitution rates vary at different positions within genes and genomes, but rates are difficult to estimate, because they are ma sked by the stochastic nature of substitutions. In this paper, a linea r method, pattern filtering, is described which can optimally separate the signals (related to substitution rates or to other measures of se quence change) from stochastic noise. Pattern filtering promises to be useful in both genomic and molecular evolution studies. In an example using mitochondrial genomes, it is shown that pattern filtering can r eveal coding and noncoding regions without the need for prior identifi cation of reading frames or other knowledge of the sequence and promis es to be an important tool for genomic analysis. In a second example, it is shown that pattern filtering allows one to classify sites on the basis of an estimator of substitution rates. Using elongation factor EF-1 alpha sequences, it is shown that the fastest sites favor archaea as the sister taxon of eukaryotes, whereas the slower sites support t he eocyte prokaryotes as the sister taxon of eukaryotes, suggesting th at the former result is an artifact of ''long branch attraction.''