We examined the abundance of microsatellites with repeated unit lengths of
1-6 base pairs in several eukaryotic taxonomic groups: primates, rodents, o
ther mammals, nonmammalian vertebrates, arthropods, Caenorhabditis elegans,
plants, yeast, and other fungi. Distribution of simple sequence repeats wa
s compared between exons, introns, and intergenic regions. Tri- and hexanuc
leotide repeats prevail in protein-coding exons of all taxa, whereas the de
pendence of repeat abundance on the length of the repeated unit shows a ver
y different pattenl as well as taxon-specific variation in intergenic regio
ns and introns. Although it is known that coding and noncoding regions diff
er significantly in their microsatellite distribution, in addition we could
demonstrate characteristic differences between intergenic regions and intr
ons. We observed striking relative abundance of (CCG)(n) . (CGG)(n) trinucl
eotide repeats in intergenic regions of all vertebrates, in contrast to the
almost complete lack of this motif from introns. Taxon-specific variation
could also be detected in the frequency distributions of simple sequence mo
tifs. Our results suggest that strand-slippage theories alone are insuffici
ent to explain microsatellite distribution in the genome as a whole. Other
possible factors contributing to the observed divergence are discussed.