R. Bhandari et Sk. Brahmachari, ANALYSIS OF CAG CTG TRIPLET REPEATS IN THE HUMAN GENOME - IMPLICATIONIN TRANSCRIPTION FACTOR GENE-REGULATION/, Journal of Biosciences, 20(5), 1995, pp. 613-627
Instability and polymorphism at several CAG/CTG trinucleotide repeat l
oci have been associated with human genetic disorders. In an attempt t
o identify novel sites that may be possible loci for expansion of CAG/
CTG repeats, we searched all human sequences in the EMBL nucleotide se
quence database for (CAG)(5) and (CTG)(5) repeats. We have identified
121 human DNA sequences of known and unknown functions that contain st
retches of five or more CAG or CTG repeats. Many repeat stretches were
interrupted by variant triplets, a significant number of which differ
from the repeat triplet only by a single base, suggesting that these
evolved from the parent triplet by point mutations. A large number of
human transcription factor genes mere found to contain CAG repeats wit
hin their coding sequences. Analysis of the EMBL transcription factors
database showed that many transcription factor genes of other eukaryo
tes, including genes involved in Drosophila embryo development, posses
s these repeats. Interestingly, CAG repeats are absent from prokaryoti
c transcription factors. Different sequence entries for the human TATA
box binding protein showed a polymorphism in the length of the CAG re
peat in this gene, suggesting that loci other than those already known
to be associated with genetic diseases may be possible sites for repe
at instability related disorders. On the basis of our findings in this
database analysis, we propose a role for CAG repeats as cis-acting re
gulatory elements involved in fine-tuning gene expression.