The EcoGene database provides a set of gene and protein sequences derived f
rom the genome sequence of Escherichia coli K-12, EcoGene is a source of re
-annotated sequences for the SWISS-PROT and Colibri databases. EcoGene is u
sed for genetic and physical map compilations in collaboration with the Coi
l Genetic Stock Center. The EcoGene12 release includes 4293 genes. EcoGene1
2 differs from the GenBank annotation of the complete genome sequence in se
veral ways, including (i) the revision of 706 predicted or confirmed gene s
tart sites, (ii) the correction or hypothetical reconstruction of 61 frames
hifts caused by either sequence error or mutation, (iii) the reconstruction
of 14 protein sequences interrupted by the insertion of IS elements, and (
iv) predictions that 92 genes are partially deleted gene fragments. A liter
ature survey identified 717 proteins whose N-terminal amino acids have been
verified by sequencing. 12 446 cross-references to 6835 literature citatio
ns and abstracts are provided. EcoGene is accessible at a new website: http
://bmb.med.miami.edu/EcoGene/EcoWeb. Users can search and retrieve individu
al EcoGene GenePages or they can download large datasets for incorporation
into database management systems, facilitating various genome-scale computa
tional and functional analyses.