We study the length distribution functions for the 16 possible distinct dim
eric tandem repeats in DNA sequences of diverse taxonomic partitions of Gen
Bank (known human and mouse genomes, and complete genomes of Caenorhabditis
elegans and yeast). For coding DNA, we find that all 16 distribution funct
ions are exponential. For non-coding DNA, the distribution functions for mo
st of the dimeric repeats have surprisingly long tails, that fit a power-la
w function. We hypothesize that: (i) the exponential distributions of dimer
ic repeats in protein coding sequences indicate strong evolutionary pressur
e against tandem repeat expansion in coding DNA sequences; and (ii) long ta
ils in the distributions of dimers in non-coding DNA may be a result of var
ious mutational mechanisms. These long, non-exponential tails in the distri
bution of dimeric repeats in non-coding DNA are hypothesized to be due to t
he higher tolerance of non-coding DNA to mutations. By comparing genomes of
various phylogenetic types of organisms, we find that the shapes of the di
stributions are not universal, but rather depend on the specific class of s
pecies and the type of a dimer. (C) 2000 Academic Press.