P. Lio et S. Ruffo, SEARCHING FOR GENOMIC CONSTRAINTS, Nuovo cimento della Societa italiana di fisica. D, Condensed matter,atomic, molecular and chemical physics, biophysics, 20(1), 1998, pp. 113-127
We have analyzed general properties of very long DNA sequences belongi
ng to simple and complex organisms, by using different correlation met
hods. We have distinguished those base compositional rules that concer
n the entire genome which we call ''genomic constraints'' from the rul
es that depend on the ''external natural selection'' acting on single
genes, i.e. protein-centered constraints. We show that G+C content, pu
rine/pyrimidine distributions and biological complexity of the organis
m are the most important factors which determine base compositional ru
les and genome complexity. Three main facts are here reported: bacteri
a with high G+C content have more restrictions on base composition tha
n those with low G+C content; at constant G+C content more complex org
anisms, ranging from prokaryotes to higher eukaryotes (e.g., human), d
isplay an increase of repeats 10-20 nucleotides long, which are also p
artly responsible for long-range correlations; word selection of lengt
h 3 to 10 is stronger in human and in bacteria for two distinct reason
s. With respect to previous studies, we have also compared the genomic
sequence of the archeon Methanococcus jannaschii with those of bacter
ia and eukaryotes: it shows sometimes an intermediate statistical beha
vior.