CONSTRUCTING GENE MODELS FROM ACCURATELY PREDICTED EXONS - AN APPLICATION OF DYNAMIC-PROGRAMMING

Citation
Y. Xu et al., CONSTRUCTING GENE MODELS FROM ACCURATELY PREDICTED EXONS - AN APPLICATION OF DYNAMIC-PROGRAMMING, Computer applications in the biosciences, 10(6), 1994, pp. 613-623
Citations number
10
Categorie Soggetti
Mathematical Methods, Biology & Medicine","Computer Sciences, Special Topics","Computer Science Interdisciplinary Applications","Biology Miscellaneous
ISSN journal
02667061
Volume
10
Issue
6
Year of publication
1994
Pages
613 - 623
Database
ISI
SICI code
0266-7061(1994)10:6<613:CGMFAP>2.0.ZU;2-9
Abstract
This paper presents a computationally efficient algorithm, the Gene As sembly Program III (GAP III), for constructing gene models from a set of accurately-predicted 'exons'. The input to the algorithm is a set o f clusters of exon candidates, generated by a new version of the GRAIL coding region recognition system. The exon candidates of a cluster di ffer in their presumed edges and occasionally in their reading frames. Each exon candidate has a numerical score representing its 'probabili ty' of being an actual exon. GAP III uses a dynamic programming algori thm to construct a gene model, complete or partial, by optimizing a pr edefined objective function. The optimal gene models constructed by GA P III correspond very well with the structures of genes which have bee n determined experimentally and reported in the Genome Sequence Databa se (GSDB). On a test set of 137 human and mouse DNA sequences consisti ng of 954 true exons, GAP III constructed 137 gene models using 892 ex ons, among which 859 (859/954 = 90%) are true exons and 33 (33/892 = 3 %) are false positive. Among the 859 true positives, 635 (74%) match t he actual exons exactly, and 838 (98%) have at least one edge correct. GAP III is computationally efficient. If we use E and C to represent the total number of exon candidates in all clusters and the number of clusters, respectively, the running time of GAP III is proportional to (E x C).