Delineating relative homogeneous G+C domains in DNA sequences

Authors
Citation
Wt. Li, Delineating relative homogeneous G+C domains in DNA sequences, GENE, 276(1-2), 2001, pp. 57-72
Citations number
50
Categorie Soggetti
Molecular Biology & Genetics
Journal title
GENE
ISSN journal
03781119 → ACNP
Volume
276
Issue
1-2
Year of publication
2001
Pages
57 - 72
Database
ISI
SICI code
0378-1119(20011003)276:1-2<57:DRHGDI>2.0.ZU;2-#
Abstract
The concept of homogeneity of G + C content is always relative and subjecti ve. This point is emphasized and quantified in this paper using a simple ex ample of one sequence segmented into two subsequences. Whether the sequence is homogeneous or not can be answered by whether the two-subsequence model describes the DNA sequence better than the one-sequence model. There are a t least three equivalent ways of looking at the 1-to-2 segmentation: Jensen -Shannon divergence measure, log likelihood ratio test, and model selection using Bayesian information criterion. Once a criterion is chosen, a DNA se quence can be recursively segmented into multiple domains. We use one subje ctive criterion called segmentation strength based on the Bayesian informat ion criterion. Whether or not a sequence is homogeneous and how many domain s it has depend on this criterion. We compare six different genome sequence s (yeast S. cerevisiae chromosome III and IV, bacterium M. pneumoniae, huma n major histocompatibility complex sequence, longest contigs in human chrom osome 21 and 22) by recursive segmentations at different strength criteria. Results by recursive segmentation confirm that yeast chromosome IV is more homogeneous than yeast chromosome III, human chromosome 21 is more homogen eous than human chromosome 22, and bacterial genomes may not be homogeneous due to short segments with distinct base compositions. The recursive segme ntation also provides a quantitative criterion for identifying isochores in human sequences. Some features of our recursive segmentation, such as the possibility of delineating domain borders accurately, are superior to those of the moving-window approach commonly used in such analyses. (C) 2001 Els evier Science B.V. All rights reserved.