SEGMENT: identifying compositional domains in DNA sequences

Citation
Jl. Oliver et al., SEGMENT: identifying compositional domains in DNA sequences, BIOINFORMAT, 15(12), 1999, pp. 974-979
Citations number
13
Categorie Soggetti
Multidisciplinary
Journal title
BIOINFORMATICS
ISSN journal
13674803 → ACNP
Volume
15
Issue
12
Year of publication
1999
Pages
974 - 979
Database
ISI
SICI code
1367-4803(199912)15:12<974:SICDID>2.0.ZU;2-W
Abstract
Motivation: DNA sequences are formed by patches or domains of different nuc leotide composition. In a few simple sequences, domains can simply be ident ified by eye; however; most DNA sequences show a complex compositional hete rogeneity (fractal structure), which cannot be properly detected by current methods. Recently, a computationally efficient segmentation method to anal yse such nonstationary sequence structures, based on the Jensen-Shannon ent ropic divergence, has been described. Specific algorithms implementing this method are now needed. Results: Here we describe a heuristic segmentation algorithm for DNA sequen ces, which was implemented an a Windows program (SEGMENT). The program divi des a DNA sequence into compositionally homogeneous domains by iterating a local optimization procedure at a given statistical significance. Once a se quence is partitioned into domains, a global measure of sequence compositio nal complexity (SCC), accounting for both the sizes and compositional biase s of all the domains in the sequence, is derived. SEGMENT computes SCC as a function of the significance level, which provides a multiscale view of se quence complexity.