Rl. Tatusov et al., METABOLISM AND EVOLUTION OF HAEMOPHILUS-INFLUENZAE DEDUCED FROM A WHOLE-GENOME COMPARISON WITH ESCHERICHIA-COLI, Current biology, 6(3), 1996, pp. 279-291
Background: The 1.83 Megabase (Mb) sequence of the Haemophilus influen
zae chromosome, the first completed genome sequence of a cellular life
form, has been recently reported, Approximately 75% of the 4.7 Mb gen
ome sequence of Escherichia coli is also available. The life styles of
the two bacteria are very different - H. influenzae is an obligate pa
rasite that lives in human upper respiratory mucosa and can be cultiva
ted only on rich media, whereas E. coli is a saprophyte that can grow
on minimal media, A detailed comparison of the protein products encode
d by these two genomes is expected to provide valuable insights into b
acterial cell physiology and genome evolution. Results: We describe th
e results of computer analysis of the amino-acid sequences of 1703 put
ative proteins encoded by the complete genome of H. influenzae, We det
ected sequence similarity to proteins in current databases for 92% of
the H. influenzae protein sequences, and at least a general functional
prediction was possible for 83%, A comparison of the H. influenzae pr
otein sequences with those of 3010 proteins encoded by the sequenced 7
5% of the E. coli genome revealed 1128 pairs of apparent orthologs, wi
th an average of 59% identity, In contrast to the high similarity betw
een orthologs, the genome organization and the functional repertoire o
f genes in the two bacteria were remarkably different, The smaller gen
ome size of H. influenzae is explained, to a large extent, by a reduct
ion in the number of paralogous genes. There was no long range colinea
rity between the E. coli and H. influenzae gene orders, but over 70% o
f the orthologous genes were found in short conserved strings, only ab
out half of which were operons in E. coli. Superposition of the H. inf
luenzae enzyme repertoire upon the known E. coli metabolic pathways al
lowed us to reconstruct similar and alternative pathways in H. influen
zae and provides an explanation for the known nutritional requirements
. Conclusions: By comparing proteins encoded by the two bacterial geno
mes, we have shown that extensive gene shuffling and variation in the
extent of gene paralogy are major trends in bacterial evolution; this
comparison has also allowed us to deduce crucial aspects of the largel
y uncharacterized metabolism of H. influenzae.