CpG islands are found at the 5' end of approximately 60% of human genes and
so are important genomic landmarks. They are concentrated in early-replica
ting, highly acetylated gene-rich regions. With respect to CpG island conte
nt, human Chrs 18 and 22 are very different from each other: Chr 18 appears
to be CpG island poor, whereas Chr 22 appears to be CpG island rich. We ha
ve constructed and validated CpG island libraries from flow-sorted Chrs 18
and 22 and used these to estimate the difference in number of CpG islands f
ound on these two chromosomes. These libraries contain normalized collectio
ns of sequences from the 5' end of genes. Clones from the libraries were se
quenced and compared with the sequence databases; one third matched ESTs, t
hus anchoring these ESTs at the 5' end of their gene. However, it was strik
ing that many clones either had no match or matched only existing CpG islan
d clones. This suggests that a significant proportion of 5' gene sequences
are absent from databases, presumably either because they are difficult to
clone or the gene is poorly expressed and/or has a restricted expression pa
ttern. This point should be taken into consideration if the currently avail
able libraries are those used for the elucidation of complete, as opposed t
o partial, gene sequences. The Chr 18 and 22 CpG island libraries are a seq
uence resource for the isolation of such 5' gene sequences from specific hu
man chromosomes.