J. Kleffe et al., GENEGENERATOR - A FLEXIBLE ALGORITHM FOR GENE PREDICTION AND ITS APPLICATION TO MAIZE SEQUENCES, BIOINFORMATICS, 14(3), 1998, pp. 232-243
Citations number
15
Categorie Soggetti
Computer Science Interdisciplinary Applications","Biology Miscellaneous","Computer Science Interdisciplinary Applications","Biochemical Research Methods
Motivation: We developed GeneGenerator because of the need for a tool
to predict gene structure without knowing in advance how ro score pote
ntial exons and introns in order to obtain the best results, pertinent
in particular to less well-studied organisms for which suitable train
ing sets al-e snzall. GeneGenerator is a very flexible algorithm which
for. a given genomic sequence generates a number of feasible gene str
uctures satisfying use-defined constraints. The specific implementatio
n described in detail requires minimum scoring for translation start a
nd donor and acceptor splice sites according to previously trained log
itlinear models. In addition, potential exons and introns are required
to exceed specified minimal lengths and threshold scores for coding o
r non-coding potential derived as long-likelihood ratios of appropriat
e Markov sequence models. Results: A database of 46 non-reductant geno
mic sequences from maize is used for illustration. It is shown that th
e correct gene structures do not always maximize the considered tar-ge
t function. However;ill most cases, the correct or nearly correct stru
ctures are found in a small set of high-scoring structures. A critical
review of the generated structures sometimes allows the choices to be
narrowed by considering additional variables such as predicted splice
site strength or local optimality of splice site scores. Summary stat
istics for prediction accuracy over all 46 maize genes are derived und
er cross-validation and non-cross-validation training conditions for t
he markov sequence models. The algorithm achieved exon sensitivity of
0.81 and specificity of 0.75 on an independent set of 14 novel maize g
enomic segments.