S. Karlin et J. Mrazek, COMPOSITIONAL DIFFERENCES WITHIN AND BETWEEN EUKARYOTIC GENOMES, Proceedings of the National Academy of Sciences of the United Statesof America, 94(19), 1997, pp. 10227-10232
Eukaryotic genome similarity relationships are inferred using sequence
information derived from large aggregates of genomic sequences. Compa
risons within and between species sample sequences are based on the pr
ofile of dinucleotide relative abundance values (The profile is rho(XY
) = f(XY)*/f(X)*f(Y)* for all XY, where f(X)* denotes the frequency o
f the nucleotide X and f(XY) denotes the frequency of the dinucleotid
e XY, both computed from the sequence concatenated with its inverted c
omplement), Previous studies with respect to prokaryotes and this stud
y document that profiles of different DNA sequence samples (sample siz
e greater than or equal to 50 kb) from the same organism are generally
much more similar to each other than they are to profiles from other
organisms, and that closely related organisms generally have more simi
lar profiles than do distantly related organisms, On this basis we ref
er to the collection {rho(XY)} as the genome signature. This paper id
entifies rho(XY) extremes and compares genome signature differences f
or a diverse range of eukaryotic species. Interpretations on the mecha
nisms maintaining these profile differences center on genome-wide repl
ication, repair, DNA structures, and context-dependent mutational bias
es, It is also observed that mitochondrial genome signature difference
s between species parallel the corresponding nuclear genome signature
differences despite large differences between corresponding mitochondr
ial and nuclear signatures. The genome signature differences also have
implications for contrasts between rodents and other mammals, and bet
ween monocot and dicot plants, as well as providing evidence for simil
arities among fungi and the diversity of protists.