GENEGENERATOR - A FLEXIBLE ALGORITHM FOR GENE PREDICTION AND ITS APPLICATION TO MAIZE SEQUENCES

Citation
J. Kleffe et al., GENEGENERATOR - A FLEXIBLE ALGORITHM FOR GENE PREDICTION AND ITS APPLICATION TO MAIZE SEQUENCES, BIOINFORMATICS, 14(3), 1998, pp. 232-243
Citations number
15
Categorie Soggetti
Computer Science Interdisciplinary Applications","Biology Miscellaneous","Computer Science Interdisciplinary Applications","Biochemical Research Methods
Journal title
ISSN journal
13674803
Volume
14
Issue
3
Year of publication
1998
Pages
232 - 243
Database
ISI
SICI code
1367-4803(1998)14:3<232:G-AFAF>2.0.ZU;2-K
Abstract
Motivation: We developed GeneGenerator because of the need for a tool to predict gene structure without knowing in advance how ro score pote ntial exons and introns in order to obtain the best results, pertinent in particular to less well-studied organisms for which suitable train ing sets al-e snzall. GeneGenerator is a very flexible algorithm which for. a given genomic sequence generates a number of feasible gene str uctures satisfying use-defined constraints. The specific implementatio n described in detail requires minimum scoring for translation start a nd donor and acceptor splice sites according to previously trained log itlinear models. In addition, potential exons and introns are required to exceed specified minimal lengths and threshold scores for coding o r non-coding potential derived as long-likelihood ratios of appropriat e Markov sequence models. Results: A database of 46 non-reductant geno mic sequences from maize is used for illustration. It is shown that th e correct gene structures do not always maximize the considered tar-ge t function. However;ill most cases, the correct or nearly correct stru ctures are found in a small set of high-scoring structures. A critical review of the generated structures sometimes allows the choices to be narrowed by considering additional variables such as predicted splice site strength or local optimality of splice site scores. Summary stat istics for prediction accuracy over all 46 maize genes are derived und er cross-validation and non-cross-validation training conditions for t he markov sequence models. The algorithm achieved exon sensitivity of 0.81 and specificity of 0.75 on an independent set of 14 novel maize g enomic segments.