Wb. Valhmu et al., STRUCTURE OF THE HUMAN AGGRECAN GENE - EXON-INTRON ORGANIZATION AND ASSOCIATION WITH THE PROTEIN DOMAINS, Biochemical journal, 309, 1995, pp. 535-542
The complete exon-intron organization of the human aggrecan gene has b
een defined, and the exon organization has been compared with the indi
vidual domains of the protein core. A yeast artificial chromosome cont
aining the aggrecan gene was selected from the Centre d'Etude du Polym
orphisme Humaine yeast artificial chromosome library, A cosmid sublibr
ary was created from this, and direct sequencing of individual cosmids
was used to provide the exon-intron organization. The human aggrecan
gene was found to be composed of 19 exons ranging in size from 77 to 4
224 bp. Exon 1 is non-coding, whereas exons 2-19 code for a protein co
re of 2454 amino acids with a calculated mass of 254379 Da. Intron 1 o
f the gene is at least 13 kb. Overall, the sizes of the 18 introns ran
ge from 0.5 to greater than 13 kb. Each intron begins with a GT and en
ds with an AG, thus obeying the GT/AG rule of splice-junction sequence
s. The entire coding region is contained in 39.4 kb of the gene. The o
rganization of exons is strongly related to the specific domains of th
e protein core. The A loop of G1 and the interglobular domain are enco
ded by exons 3 and 7 respectively. The B and B' loops of G1 are encode
d by exons 4-6, and those of G2 are encoded by exons 8-10. These sets
of exons, coding for the B and B' loops, are identical in size and org
anization. This is supported by the intron classes associated with the
se exons. Exon 11 codes for the 5' half of the keratan sulphate-rich r
egion, and exon 12 codes for the 3' half of the keratan sulphate-rich
region as well as the entire chondroitin sulphate-rich region. G3 is e
ncoded by exons 13-18, including the alternatively spliced epidermal g
rowth factor-like and complement regulatory protein-like domains. The
correspondence between the exon organization and the protein domains a
rgues strongly for modular assembly of the aggrecan gene.