Predicted highly expressed (PHX) genes are characterized for the completely
sequenced genomes of the four fast-growing bacteria Escherichia coli, Haem
ophilus influenzae, Vibrio cholerae, and Bacillus subtilis. Our approach to
ascertaining gene expression levels relates to codon usage differences amo
ng certain gene classes: the collection of all genes (average gene), the en
semble of ribosomal protein genes, major translation/transcription processi
ng factors, and genes for polypeptides of chaperone/degradation complexes.
A gene is predicted highly expressed (PHX) if its codon frequencies are clo
se to those of the ribosomal proteins, major translation/transcription proc
essing factor, and chaperone/degradation standards but strongly deviant fro
m the average gene codon frequencies. PHX genes identified by their codon u
sage frequencies among prokaryotic genomes commonly include those for ribos
omal proteins, major transcription/translation processing factors (several
occurring in multiple copies), and major chaperone/degradation proteins. Al
so PHX genes generally include those encoding enzymes of essential energy m
etabolism pathways of glycolysis, pyruvate oxidation, and respiration (aero
bic and anaerobic), genes of fatty acid biosynthesis, and the principal gen
es of amino acid and nucleotide biosyntheses. Gene classes generally not PH
X include most repair protein genes, virtually all vitamin biosynthesis gen
es, genes of two-component sensor systems, most regulatory genes, and most
genes expressed in stationary phase or during starvation. Members of the se
t of PHX aminoacyl-tRNA synthetase genes contrast sharply between genomes.
There are also subtle differences among the PHX energy metabolism genes bet
ween E. coli and B. subtilis, particularly with respect to genes of the tri
carboxylic acid cycle. The good agreement of PHX genes of E. coli and B. su
btilis with high protein abundances, as assessed by two-dimensional gel det
ermination, is verified. Relationships of PHX genes with stoichiometry, mul
tifunctionality, and operon structures are also examined. The spatial distr
ibution of PHX genes within each genome reveals clusters and significantly
long regions without PHX genes.