Yi. Wolf et al., Genome alignment, evolution of prokaryotic genome organization, and prediction of gene function using genomic context, GENOME RES, 11(3), 2001, pp. 356-372
Gene order in prokaryotes is conserved to a much lesser extent than protein
sequences. Only several operons, primarily those that code for physically
interacting proteins, are conserved in all or most of the bacterial and arc
haeal genomes. Nevertheless, even the limited conservation of operon organi
zation that is observed call provide valuable evolutionary and functional c
lues through multiple genome comparisons. A program for constructing gapped
local alignments of conserved gene strings in two genomes was developed. T
he statistical significance of the local alignments was assessed using Mont
e Carlo simulations. Sets of local alignments were generated for all pairs
of completely sequenced bacterial and archaeal genomes, and for each genome
a template-anchored multiple alignment was constructed. In most pairwise g
enome comparisons, <10% of the genes in each genome belonged to conserved g
ene strings. When closely related pairs of species (i.e., two mycoplasmas)
are excluded, the total coverage of genomes by conserved gene strings range
d from <5% for the cyanobacterium Synechocystis sp to 24% for the minimal g
enome of Mycoplasma genitalium, and 23% in Thermotoga maritime. The coverag
e of the archaeal genomes was only slightly lower than that of bacterial ge
nomes. The majority of the conserved gene strings are known operons, with t
he ribosomal superoperon being the top-scoring string in most genome compar
isons. However, in some of the bacterial-archaeal pairs, the superoperon is
rearranged to the extent that other operons, primarily those subject to ho
rizontal transfer, show the greatest level of conservation, such as the arc
haeal-type H+-ATPase operon or ABC-type transport cassettes. The level of g
ene order conservation among prokaryotic genomes was compared to the cooccu
rrence of genomes in clusters of orthologous genes (COGs) and to the conser
vation of protein sequences themselves. Only limited correlation was observ
ed between these evolutionary variables. Gene order conservation shows a mu
ch lower variance than the cooccurrence of genomes in COGs, which indicates
that intragenome homogenization via recombination occurs in evolution much
faster than intergenome homogenization via horizontal gene transfer and li
neage-specific gene loss. The potential of using template-anchored multiple
-genome alignments for predicting functions of uncharacterized genes was qu
antitatively assessed. Functions were predicted or significantly clarified
for similar to 90 COGs (similar to4% of the total of 2414 analyzed COGs). T
he most significant predictions were obtained for the poorly characterized
archaeal genomes; these include a previously uncharacterized restriction-mo
dification system, a nuclease-helicase combination implicated in DNA repair
, and the probable archaeal counterpart of the eukaryotic exosome. Multiple
genome alignments are a resource for studies on operon rearrangement and d
isruption, which is central to our understanding of the evolution of prokar
yotic genomes. Because of the rapid evolution of the gene order, the potent
ial of genome alignment for prediction of gene functions is limited, but ne
vertheless, such predictions information significantly complements the resu
lts obtained through protein sequence and structure analysis.