An algorithm called segment-based dynamic programming is described for
predicting gene structure from a sequence of genomic DNA, The algorit
hm explores the space of gene structures that satisfy junctional and f
rame constraints and finds the gene structure that optimizes the sum o
f junctional and segmental scoring functions. Junctional constraints s
pecify acceptable sites of initiation, termination, and splicing, wher
eas frame constraints ensure that the total exon length is a multiple
of three and that no in-frame stop codons occur within exons or at exo
n-exon junctions, By computing over segments, segment-based dynamic pr
ogramming differs from existing dynamic programming algorithms, which
compute over individual nucleotides or over clusters of exon candidate
s, Because segment-based dynamic programming maintains reading frame a
nd phase information for each segment, it can assemble exons in-frame
as well as score them in-frame, The algorithm is used to quantify the
computational power of constraints, Experimental results show that fra
me constraints reduce the size of the search space by several orders o
f magnitude and that cardinality constraints place an asymptotic limit
on the size of the search space, The algorithm is also used to compar
e the accuracy of different methods for assembly and scoring, A scorin
g scheme based on fifth-order Markov hexamer frequencies is presented
and used in three objective functions, corresponding to in-frame, fram
e-independent, and frame-maximal scoring strategies, Experimental resu
lts show that in-frame assembly improves specificity only slightly ove
r frame-independent assembly, whereas in-frame scoring improves specif
icity substantially over frame-independent and frame-maximal scoring.