Background: Triplet repeat sequences are of considerable biological importa
nce as the expansion of such tandem arrays can lead to the onset of a range
of human diseases. Such sequences can self-pair via mismatch alignments to
form higher order structures that have the potential to cause replication
blocks, followed by strand slippage and sequence expansion, The all-purine
d(GGA)(n) triplet repeat sequence is of particular interest because purines
can align via G.G, A.A and G.A mismatch formation,
Results: We have solved the structure of the uniformly C-13,N-15-labeled d(
G1-G2-A3-G4-G5-A6-T7) sequence in 10 mM Na+ solution. This sequence adopts
a novel twofold-symmetric duplex fold where interlocked V-shaped arrowhead
motifs are aligned solely via interstrand G1.G4, G2.G5 and A3.A6 mismatch f
ormation, The tip of the arrowhead motif is centered about the p-A3-p step,
and symmetry-related local parallel-stranded duplex domains are formed by
the G1-G2-A3 and G4-G5-A6 segments of partner strands.
Conclusions: The purine-rich (GGA)(n) triplet repeat sequence is dispersed
throughout the eukaryotic genome. Several features of the arrowhead duplex
motif for the (GGA)(2) triplet repeat provide a unique scaffold for molecul
ar recognition. These include the large localized bend in the sugar-phospha
te backbones, the segmental parallel-stranded alignment of strands and the
exposure of the Watson-Crick edges of several mismatched bases.