The performance of the GeneScan algorithm for gene identification has been
improved by incorporation of a directed iterative scanning procedure. Appli
cation is made here to the cases of bacterial and organnellar genomes. The
sensitivity of gene identification was 100% in Plasmodium falciparum plasti
d-like genome (35 kb) and in 98% in the Mycoplasma genitalium genome (simil
ar to 580 kb) and the Haemophilus influenzae Rd genome (similar to 1.8 Mb).
Sensitivity was found to improve in both the Open Reading Frames (ORFs) wh
ich have been identified as genes (by homology or by other methods) and tho
se that are classified as hypothetical. False positive assignments (at the
nucleotide level) were 0.25% in H. influenzae genome and 0.3% in M. genital
ium. There were no false positive assignments in the plastid-like genome. T
he agreement between the GeneScan predictions and GeneMark predictions of p
utative ORFs was 97% in M. genitalium genome and 86% in H. influenzae genom
e. In terms of an exact match between predicted genes/ORFs and the annotati
on in the databank, GeneScan performance was evaluated to be between 72% an
d 90% in different genomes. We predict five putative ORFs that were not ann
otated earlier in the GenBank files for both M. genitalium and H. influenza
e genomes. Our preliminary analysis of the newly sequenced G + C rich genom
e of Mycobacterium tuberculosis H37Rv also shows comparable sensitivity (99
%). (C) 1999 Elsevier Science Ltd. All rights reserved.