L. Duret et N. Galtier, The covariation between TpA deficiency, CpG deficiency, and G + C content of human isochores is due to a mathematical artifact, MOL BIOL EV, 17(11), 2000, pp. 1620-1625
CpG and TpA dinucleotides are underrepresented in the human genome. The CpG
deficiency is due to the high mutation rate from C to T in methylated CpG'
s. The TpA suppression was thought to reflect a counterselection against Tp
A's destabilizing effect in RNA. Unexpectedly, the TpA and CpG deficiencies
vary according to the G+C contents of sequences. It has been proposed that
the variation in CpG suppression was correlated with a particular chromati
n organization in G+C-rich isochores. Here, we present an improved model of
dinucleotide evolution accounting for the overlap between successive dinuc
leotides. We show that an increased mutation rate from CpG to TpG or CpA in
duces both an apparent TpA deficiency and a correlation between CpG and TpA
deficiencies and GI-C content. Moreover, this model shows that the ratio o
f observed over expected CpG frequency underestimates the real CpG deficien
cy in G+C-rich sequences. The predictions of our model fit well with observ
ed frequencies in human genomic data. This study suggests that previously p
ublished selectionist interpretations of patterns of dinucleotide frequenci
es should be taken with caution. Moreover we propose new criteria to identi
fy unmethylated CpG islands taking into account this bias in the measure of
CpG depletion.