S. Audic et Jm. Claverie, SELF-IDENTIFICATION OF PROTEIN-CODING REGIONS IN MICROBIAL GENOMES, Proceedings of the National Academy of Sciences of the United Statesof America, 95(17), 1998, pp. 10026-10031
A new method for predicting protein-coding regions in microbial genomi
c DNA sequences is presented. It uses an ab initio iterative Markov mo
deling procedure to automatically perform the partition of genomic seq
uences into three subsets shown to correspond to coding, coding on the
opposite strand, and noncoding segments. In contrast to current metho
ds, such as GENEMARK [Borodovsky, M. & McIninch, J. D. (1993) Comput.
Chern. 17, 123-133], no training set or prior knowledge of the statist
ical properties of the studied genome are required. This new method to
lerates error rates of 1-2% and can process unassembled sequences. It
is thus ideal for the analysis of genome survey and/or fragmented sequ
ence data from uncharacterized microorganisms. The method was validate
d on 10 complete bacterial genomes (from four major phylogenetic linea
ges). The results show that protein-coding regions can be identified w
ith an accuracy of up to 90% with a totally automated and objective pr
ocedure.