We have developed an algorithm that predicted 11,265 potentially polymorphi
c tandem repeats within transcribed sequences. We estimate that 22% (2,207/
9,717) of the annotated clusters within UniGene contain at least one potent
ially polymorphic locus. Our predictions were tested by allelotyping a pane
l of similar to 30 individuals for 5% of these regions, confirming polymorp
hism for more than half the loci tested. Our study indicates that tandem-re
peat polymorphisms in genes are more common than is generally believed. App
roximately 8% of these loci are within coding sequences and, if polymorphic
, would result in frameshifts. Our catalogue of putative polymorphic repeat
s within transcribed sequences comprises a large set of potentially phenoty
pic or disease-causing loci. In addition, from the anomalous character of t
he repetitive sequences within unannotated clusters, we also conclude that
the UniGene cluster count substantially overestimates the number of genes i
n the human genome. We hypothesize that polymorphisms in repeated sequences
occur with some baseline distribution, on the basis of repeat homogeneity,
size, and sequence composition, and that deviations from that distribution
are indicative of the nature of selection pressure at that locus. We find
evidence of selective maintenance of the ability of some genes to respond v
ery rapidly, perhaps even on intragenerational timescales, to fluctuating s
elective pressures.