GENEMARK.HMM - NEW SOLUTIONS FOR GENE FINDING

Citation
Av. Lukashin et M. Borodovsky, GENEMARK.HMM - NEW SOLUTIONS FOR GENE FINDING, Nucleic acids research, 26(4), 1998, pp. 1107-1115
Citations number
31
Categorie Soggetti
Biology
Journal title
ISSN journal
03051048
Volume
26
Issue
4
Year of publication
1998
Pages
1107 - 1115
Database
ISI
SICI code
0305-1048(1998)26:4<1107:G-NSFG>2.0.ZU;2-C
Abstract
The number of completely sequenced bacterial genomes has been growing fast. There are computer methods available for finding genes but yet t here is a need for more accurate algorithms. The GeneMark.hmm algorith m presented here was designed to improve the gene prediction quality i n terms of finding exact gene boundaries. The idea was to embed the Ge neMark models into naturally derived hidden Markov model framework wit h gene boundaries modeled as transitions between hidden states. We als o used the specially derived ribosome binding site pattern to refine p redictions of translation initiation codons. The algorithm was evaluat ed on several test sets including 10 complete bacterial genomes. It wa s shown that the new algorithm is significantly more accurate than Gen eMark in exact gene prediction. Interestingly, the high gene finding a ccuracy was observed even in the case when Markov models of order zero , one and two were used. We present the analysis of false positive and false negative predictions with the caution that these categories are not precisely defined if the public database annotation is used as a control.