We screened plant genome sequences, primarily from rice and Arabidopsis tha
liana, for CpG islands, and identified DNA segments rich in CpG dinucleotid
es within these sequences. These CpG-rich clusters appeared in the analysed
sequences as discrete peaks and occurred at the frequencies of one per 4.7
kb in rice and one per 4.0 kb in A. thaliana. In rice and A. thaliana, mos
t of the CpG-rich clusters were associated with genes, which suggests that
these clusters are useful landmarks in genome sequences for identifying gen
es in plants with small genomes. In contrast, in plants with larger genomes
, only a few of the clusters were associated with genes. These plant CpG-ri
ch clusters satisfied the criteria used for identifying human CpG islands,
which suggests that these CpG clusters may be regarded as plant CpG islands
. The position of each island relative to the 5'-end of its associated gene
varied considerably. Genes in the analysed sequences were grouped into fiv
e classes according to the position of the CpG islands within their associa
ted genes. A large proportion of the genes belonged to one of two classes,
in which a CpG island occurred near the 5'-end of the gene or covered the w
hole gene region. The position of a plant CpG island within its associated
gene appeared to be related to the extent of tissue-specific expression of
the gene; the CpG islands of most of the widely expressed rice genes occurr
ed near the 5'-end of the genes.