S. Karlin et al., HETEROGENEITY OF GENOMES - MEASURES AND VALUES, Proceedings of the National Academy of Sciences of the United Statesof America, 91(26), 1994, pp. 12837-12841
Genomic homogeneity is investigated for a broad base of DNA sequences
in terms of dinucleotide relative abundance distances (abbreviated del
ta-distances) and of of olgonucleotide compositional extremes. It is s
hown that delta-distances between different genomic sequences in the s
ame species are low, only about 2 or 3 times the distance found in ran
dom DNA, and are generally smaller than the between-species delta-dist
ances. Extremes in short oligonucleotides include underrepresentation
of TpA and overrepresentation of GpC in most temperate bacteriophage s
equences; underrepresentation of CTAG in most eubacterial genomes; und
errepresentation of GATC in most bacteriophage; CpG suppression in ver
tebrates, in all animal mitochondrial genomes, and in many thermophili
c bacterial sequences; and overrepresentation of GpG/CpC in ail animal
mitochondrial sets and chloroplast genomes. Interpretations center on
DNA structures (dinucleotide stacking energies, DNA curvature and sup
erhelicity, nucleosome organization), context-dependent mutational eve
nts, methylation effects, and processes of replication and repair.