SELF-IDENTIFICATION OF PROTEIN-CODING REGIONS IN MICROBIAL GENOMES

Citation
S. Audic et Jm. Claverie, SELF-IDENTIFICATION OF PROTEIN-CODING REGIONS IN MICROBIAL GENOMES, Proceedings of the National Academy of Sciences of the United Statesof America, 95(17), 1998, pp. 10026-10031
Citations number
30
Categorie Soggetti
Multidisciplinary Sciences
ISSN journal
00278424
Volume
95
Issue
17
Year of publication
1998
Pages
10026 - 10031
Database
ISI
SICI code
0027-8424(1998)95:17<10026:SOPRIM>2.0.ZU;2-4
Abstract
A new method for predicting protein-coding regions in microbial genomi c DNA sequences is presented. It uses an ab initio iterative Markov mo deling procedure to automatically perform the partition of genomic seq uences into three subsets shown to correspond to coding, coding on the opposite strand, and noncoding segments. In contrast to current metho ds, such as GENEMARK [Borodovsky, M. & McIninch, J. D. (1993) Comput. Chern. 17, 123-133], no training set or prior knowledge of the statist ical properties of the studied genome are required. This new method to lerates error rates of 1-2% and can process unassembled sequences. It is thus ideal for the analysis of genome survey and/or fragmented sequ ence data from uncharacterized microorganisms. The method was validate d on 10 complete bacterial genomes (from four major phylogenetic linea ges). The results show that protein-coding regions can be identified w ith an accuracy of up to 90% with a totally automated and objective pr ocedure.