In an attempt to understand the origin of CpG islands (CGIs) in mammalian g
enomes, we have Studied their location and structure according to the expre
ssion pattern of genes and to the G + C content of isochores in which they
are embedded. We show that CGIs located over the transcription start site (
named start CGIs) are very different structurally from the others (named no
-start CGIs): (1) 61.6% of the no-start CGIs are due to repeated sequences
(79% are due to Alus), whereas only 5.6% of the start CGIs are due to Such
repeats; (2) start CGIs are longer and display a higher CpGo/e ratio and G
+ C level than no-start CGIs. The frequency of tissue-specific genes associ
ated to a start CGI varies according to the genomic G + C content, from 25%
in G + C-poor isochores to 64% in G + C-rich isochores. Conversely, the fr
equency of housekeeping genes associated to a start CGI (90%) is independen
t of the isochore context. Interestingly, the structure of start CGIs is ve
ry similar for tissue-specific and housekeeping genes. Moreover, 93% of gen
es expressed in early embryo are found to exhibit a CpG island over their t
ranscription start point. These observations are consistent with the hypoth
esis that the occurrence of these CGIs is the consequence of gene expressio
n at this stage, when the methylation pattern is installed.