The human CD1 proteins belong to a lipid-glycolipid antigen-presenting or g
ene family and are related in structure and function to the MHC class I mol
ecules. Previous mapping and DNA hybridization studies have shown that five
linked genes located within a cluster on human chromosome 1q22-23 encode t
he CD1 protein family. We have analyzed the complete genomic sequence of th
e human CD1 gene cluster and found that the five active genes are distribut
ed over 175,600 nucleotides and separated by four expanded intervening geno
mic regions (IGRs) ranging in length between 20 and 68 kb. The IGRs are com
posed mostly of retroelements including five full-length L1 PA sequences an
d various pseudogenes. Some L1 sequences have acted as receptors for other
subtypes or families of retroelements. Alu molecular clocks that have evolv
ed during primate history are found distributed within the HLA class I dupl
icated segments (duplicons) but not within the duplicons of CD1. Phylogeny
of the alpha3 domain of the class I-like superfamily of proteins shows that
the CD1 cluster is well separated from HLA class I by a number of superfam
ily members including MIC (PERB11), HFE, Zn-alpha2-GP, FcRn, and MRI. Phylo
genetically, the human CD1 sequences are interspersed by CD1 sequences from
other mammalian species, whereas the human HLA class I sequences cluster t
ogether and are separated from the other mammalian sequences. Genomic and p
hylogenetic analyses support the view that the human CD1 gene copies were d
uplicated prior to the evolution of primates and the bulk of the HLA class
I genes found in humans. In contrast to the HLA class I genomic structure,
the human CD1 duplicons are smaller in size, they lack Alu clocks, and they
are interrupted by IGRs at least 4 to 14 times lon-er than the CD1 genes t
hemselves. The IGRs seem to have been created as "buffer zones" to protect
the CD1 genes from disruption by transposable elements.