ITA
ENG

Assembling genes from predicted exons in linear time with dynamic programming

Authors

Guigo, R

Citation

R. Guigo, Assembling genes from predicted exons in linear time with dynamic programming, J COMPUT BI, 5(4), 1998, pp. 681-702

Citations number

Categorie Soggetti

Biochemistry & Biophysics

Journal title

JOURNAL OF COMPUTATIONAL BIOLOGY

ISSN journal

10665277 → ACNP

Volume

Issue

Year of publication

1998

Pages

681 - 702

Database

ISI

SICI code

1066-5277(199824)5:4<681:AGFPEI>2.0.ZU;2-L

Abstract

In a number of programs for gene structure prediction in higher eukaryotic genomic sequences, exon prediction is decoupled from gene assembly: a large pool of candidate exons is predicted and scored from features located in t he query DNA sequence, and candidate genes are assembled from such a pool a s sequences of nonoverlapping frame-compatible exons. Genes are scored as a function of the scores of the assembled exons, and the highest scoring can didate gene is assumed to be the most likely gene encoded by the query DNA sequence. Considering additive gene scoring functions, currently available algorithms to determine such a highest scoring candidate gene run in time p roportional to the square of the number of predicted exons. Here, we presen t an algorithm whose running time grows only linearly with the size of the set of predicted exons. Polynomial algorithms rely on the fact that, while scanning the set of predicted exons, the highest scoring gene ending in a g iven exon can be obtained by appending the exon to the highest scoring amon g the highest scoring genes ending at each compatible preceding exon. The a lgorithm here relies on the simple fact that such highest scoring gene can be stored and updated. This requires scanning the set of predicted exons si multaneously by increasing acceptor and donor position. On the other hand, the algorithm described here does not assume an underlying gene structure m odel. Indeed, the definition of valid gene structures is externally defined in the so-called Gene Model. The Gene Model specifies simply which gene fe atures are allowed immediately upstream which other gene features in valid gene structures. This allows for great flexibility in formulating the gene identification problem. In particular it allows for multiple-gene two-stran d predictions and for considering gene features other than coding exons (su ch as promoter elements) in valid gene structures.