DETECTION OF NEW GENES IN A BACTERIAL GENOME USING MARKOV-MODELS FOR 3 GENE CLASSES

Citation
M. Borodovsky et al., DETECTION OF NEW GENES IN A BACTERIAL GENOME USING MARKOV-MODELS FOR 3 GENE CLASSES, Nucleic acids research, 23(17), 1995, pp. 3554-3562
Citations number
31
Categorie Soggetti
Biology
Journal title
ISSN journal
03051048
Volume
23
Issue
17
Year of publication
1995
Pages
3554 - 3562
Database
ISI
SICI code
0305-1048(1995)23:17<3554:DONGIA>2.0.ZU;2-T
Abstract
We further investigated the statistical features of the three classes of Escherichia coli genes that have been previously delineated by fact orial correspondence analysis and dynamic clustering methods, A phased Markov model for a nucleotide sequence of each gene class was develop ed and employed for gene prediction using the GeneMark program, The pr otein-coding region prediction accuracy was determined for class-speci fic Markov models of different orders when the programs implementing t hese models were applied to gene sequences from the same or other clas ses. It is shown that at least two training sets and two program versi ons derived for different classes of E.coli genes are necessary in ord er to achieve a high accuracy of coding region prediction for uncharac terized sequences, Some annotated E.coli genes from Class I and Class III are shown to be spurious, whereas many open reading frames (ORFs) that have not been annotated in GenBank as genes are predicted to enco de proteins, The amino acid sequences of the putative products of thes e ORFs initially did not show similarity to already known proteins. Ho wever, conserved regions have been identified in several of them by sc reening the latest entries in protein sequence databases and applying methods for motif search, while some other of these new genes have bee n identified in independent experiments.