Distributions of dimeric tandem repeats in non-coding and coding DNA sequences

Citation
Nv. Dokholyan et al., Distributions of dimeric tandem repeats in non-coding and coding DNA sequences, J THEOR BIO, 202(4), 2000, pp. 273-282
Citations number
45
Categorie Soggetti
Multidisciplinary
Journal title
JOURNAL OF THEORETICAL BIOLOGY
ISSN journal
00225193 → ACNP
Volume
202
Issue
4
Year of publication
2000
Pages
273 - 282
Database
ISI
SICI code
0022-5193(20000221)202:4<273:DODTRI>2.0.ZU;2-E
Abstract
We study the length distribution functions for the 16 possible distinct dim eric tandem repeats in DNA sequences of diverse taxonomic partitions of Gen Bank (known human and mouse genomes, and complete genomes of Caenorhabditis elegans and yeast). For coding DNA, we find that all 16 distribution funct ions are exponential. For non-coding DNA, the distribution functions for mo st of the dimeric repeats have surprisingly long tails, that fit a power-la w function. We hypothesize that: (i) the exponential distributions of dimer ic repeats in protein coding sequences indicate strong evolutionary pressur e against tandem repeat expansion in coding DNA sequences; and (ii) long ta ils in the distributions of dimers in non-coding DNA may be a result of var ious mutational mechanisms. These long, non-exponential tails in the distri bution of dimeric repeats in non-coding DNA are hypothesized to be due to t he higher tolerance of non-coding DNA to mutations. By comparing genomes of various phylogenetic types of organisms, we find that the shapes of the di stributions are not universal, but rather depend on the specific class of s pecies and the type of a dimer. (C) 2000 Academic Press.