Motivation: DNA sequences are formed by patches or domains of different nuc
leotide composition. In a few simple sequences, domains can simply be ident
ified by eye; however; most DNA sequences show a complex compositional hete
rogeneity (fractal structure), which cannot be properly detected by current
methods. Recently, a computationally efficient segmentation method to anal
yse such nonstationary sequence structures, based on the Jensen-Shannon ent
ropic divergence, has been described. Specific algorithms implementing this
method are now needed.
Results: Here we describe a heuristic segmentation algorithm for DNA sequen
ces, which was implemented an a Windows program (SEGMENT). The program divi
des a DNA sequence into compositionally homogeneous domains by iterating a
local optimization procedure at a given statistical significance. Once a se
quence is partitioned into domains, a global measure of sequence compositio
nal complexity (SCC), accounting for both the sizes and compositional biase
s of all the domains in the sequence, is derived. SEGMENT computes SCC as a
function of the significance level, which provides a multiscale view of se
quence complexity.