The concept of homogeneity of G + C content is always relative and subjecti
ve. This point is emphasized and quantified in this paper using a simple ex
ample of one sequence segmented into two subsequences. Whether the sequence
is homogeneous or not can be answered by whether the two-subsequence model
describes the DNA sequence better than the one-sequence model. There are a
t least three equivalent ways of looking at the 1-to-2 segmentation: Jensen
-Shannon divergence measure, log likelihood ratio test, and model selection
using Bayesian information criterion. Once a criterion is chosen, a DNA se
quence can be recursively segmented into multiple domains. We use one subje
ctive criterion called segmentation strength based on the Bayesian informat
ion criterion. Whether or not a sequence is homogeneous and how many domain
s it has depend on this criterion. We compare six different genome sequence
s (yeast S. cerevisiae chromosome III and IV, bacterium M. pneumoniae, huma
n major histocompatibility complex sequence, longest contigs in human chrom
osome 21 and 22) by recursive segmentations at different strength criteria.
Results by recursive segmentation confirm that yeast chromosome IV is more
homogeneous than yeast chromosome III, human chromosome 21 is more homogen
eous than human chromosome 22, and bacterial genomes may not be homogeneous
due to short segments with distinct base compositions. The recursive segme
ntation also provides a quantitative criterion for identifying isochores in
human sequences. Some features of our recursive segmentation, such as the
possibility of delineating domain borders accurately, are superior to those
of the moving-window approach commonly used in such analyses. (C) 2001 Els
evier Science B.V. All rights reserved.