Xl. Chu et al., LENGTH POLYMORPHISM OF THE HUMAN-COMPLEMENT COMPONENT C4 GENE IS DUE TO AN ANCIENT RETROVIRAL INTEGRATION, Experimental and clinical immunogenetics, 12(2), 1995, pp. 74-81
The fourth component of the complement system, C4, is encoded by two h
ighly homologous MHC-linked genes expressing the two isotypes C4A and
C4B. A gene size polymorphism (either 22.5 or 16 kb) has been describe
d which depends on the presence or absence of a 6.5-kb insertion in in
tron 9 of the C4 gene. By sequencing a C4A-specific lambda clone from
a human genomic library containing the long intron 9 as well as PCR-am
plified DNA containing the short intron, the DNA sequences of both int
rons were determined. The long and short introns have lengths of 6,787
bp and 415 bp, respectively. The sequence of the short intron is almo
st identical (96%) to the corresponding parts of the long intron. At p
osition 282 of the short intron, a 6,372-bp insertion is present in th
e long intron which has all characteristics of a full-length endogenou
s retrovirus. The proviral DNA is flanked by two 6-bp target site repe
ats. The orientation of the proviral sequence is opposite to that of t
he C4 coding strand. Long terminal repeats (LTRs) of 548 bp were found
at both ends of the provirus. A TATA box and an SV40 enhancer core as
well as a polyadenylation signal are present in the LTR. A 5' primer
binding site for lysine tRNA was identified. The strongest sequence ho
mologies were found in comparison to human endogenous retrovirus (HERV
-K): between 65-88% for gag, pol and env genes. However, a search for
open reading frames in these regions indicated the presence of multipl
e stop codons in all three reading frames. Thus it can be concluded th
at the retroviral genes are dysfunctional due to these mutations. It c
an be assumed that the integration of the retroviral sequence occurred
prior to the separation of human and primate species, which can be da
ted to a period between 23 and 10 million years ago.