Understanding the complex organization of genomes as well as predictin
g the location of genes and the possible structure of the gene product
s are some of the most important problems in current molecular biology
. Many statistical techniques are used to address these issues. A cent
ral role among them play correlation functions. This paper is based on
an analysis of the decay of the entire 4 x 4 dimensional covariance m
atrix of DNA sequences. We apply this covariance analysis to human chr
omosomal regions, yeast DNA, and bacterial genomes and interpret the t
hree most pronounced statistical features - long-range correlations, a
period 3, and a period 10-11 - using known biological facts about the
structure of genomes. For example, we relate the slowly decaying long
-range G+C correlations to dispersed repeats and CpG islands. We show
quantitatively that the 3-basepair-periodicity is due to the nonunifor
mity of the codon usage in protein coding segments. We finally show th
at periodicities of 10-11 basepairs in yeast DNA originate from an alt
ernation of hydrophobic and hydrophilic amino acids in protein sequenc
es. (C) 1998 Elsevier Science B.V. All rights reserved.