D. Haring et J. Kypr, Correlations and anticorrelations among nucleotide distributions along thegenomes of various organisms, J BIO STRUC, 17(2), 1999, pp. 267-273
We have analyzed correlations of nucleotide distributions along more than 5
0 megabases of the longest sequenced parts of the human, mouse, Drosophila,
Arabidopsis, yeast, E. coli and three kinds of viral genomes. The stronges
t correlations were observed between the distributions of C and G, in parti
cular in the genome of Drosophila. This correlation was much weaker, though
still strong, in the human genome and E. coli that exhibited the: same lev
el of this correlation. The C/G correlation hardly originates from the isoc
hores because the isochores were not reported to occur in the genomes of Dr
osophila and E. coli. The genomic distribution curves of adenine and thymin
e were also positively correlated in all analyzed organisms except for the
yeast where they were anticorrelated. Still stronger anticorrelations were,
however, observed between the genomic distributions of A and C and between
G and T These genomic distributions anticorrelated almost generally and ve
ry strong. These anticorrelations are likely to originate from point mutati
ons resulting from unrepaired CA mispairing as a replication intermediate.
The C/A or G/T anticorrelation or compensation is a very strong and general
new phenomenon that shapes the genomic nucleotide sequences.