Y. Xu et al., CONSTRUCTING GENE MODELS FROM ACCURATELY PREDICTED EXONS - AN APPLICATION OF DYNAMIC-PROGRAMMING, Computer applications in the biosciences, 10(6), 1994, pp. 613-623
This paper presents a computationally efficient algorithm, the Gene As
sembly Program III (GAP III), for constructing gene models from a set
of accurately-predicted 'exons'. The input to the algorithm is a set o
f clusters of exon candidates, generated by a new version of the GRAIL
coding region recognition system. The exon candidates of a cluster di
ffer in their presumed edges and occasionally in their reading frames.
Each exon candidate has a numerical score representing its 'probabili
ty' of being an actual exon. GAP III uses a dynamic programming algori
thm to construct a gene model, complete or partial, by optimizing a pr
edefined objective function. The optimal gene models constructed by GA
P III correspond very well with the structures of genes which have bee
n determined experimentally and reported in the Genome Sequence Databa
se (GSDB). On a test set of 137 human and mouse DNA sequences consisti
ng of 954 true exons, GAP III constructed 137 gene models using 892 ex
ons, among which 859 (859/954 = 90%) are true exons and 33 (33/892 = 3
%) are false positive. Among the 859 true positives, 635 (74%) match t
he actual exons exactly, and 838 (98%) have at least one edge correct.
GAP III is computationally efficient. If we use E and C to represent
the total number of exon candidates in all clusters and the number of
clusters, respectively, the running time of GAP III is proportional to
(E x C).