Genome alignment, evolution of prokaryotic genome organization, and prediction of gene function using genomic context

Citation
Yi. Wolf et al., Genome alignment, evolution of prokaryotic genome organization, and prediction of gene function using genomic context, GENOME RES, 11(3), 2001, pp. 356-372
Citations number
44
Categorie Soggetti
Molecular Biology & Genetics
Journal title
GENOME RESEARCH
ISSN journal
10889051 → ACNP
Volume
11
Issue
3
Year of publication
2001
Pages
356 - 372
Database
ISI
SICI code
1088-9051(200103)11:3<356:GAEOPG>2.0.ZU;2-Z
Abstract
Gene order in prokaryotes is conserved to a much lesser extent than protein sequences. Only several operons, primarily those that code for physically interacting proteins, are conserved in all or most of the bacterial and arc haeal genomes. Nevertheless, even the limited conservation of operon organi zation that is observed call provide valuable evolutionary and functional c lues through multiple genome comparisons. A program for constructing gapped local alignments of conserved gene strings in two genomes was developed. T he statistical significance of the local alignments was assessed using Mont e Carlo simulations. Sets of local alignments were generated for all pairs of completely sequenced bacterial and archaeal genomes, and for each genome a template-anchored multiple alignment was constructed. In most pairwise g enome comparisons, <10% of the genes in each genome belonged to conserved g ene strings. When closely related pairs of species (i.e., two mycoplasmas) are excluded, the total coverage of genomes by conserved gene strings range d from <5% for the cyanobacterium Synechocystis sp to 24% for the minimal g enome of Mycoplasma genitalium, and 23% in Thermotoga maritime. The coverag e of the archaeal genomes was only slightly lower than that of bacterial ge nomes. The majority of the conserved gene strings are known operons, with t he ribosomal superoperon being the top-scoring string in most genome compar isons. However, in some of the bacterial-archaeal pairs, the superoperon is rearranged to the extent that other operons, primarily those subject to ho rizontal transfer, show the greatest level of conservation, such as the arc haeal-type H+-ATPase operon or ABC-type transport cassettes. The level of g ene order conservation among prokaryotic genomes was compared to the cooccu rrence of genomes in clusters of orthologous genes (COGs) and to the conser vation of protein sequences themselves. Only limited correlation was observ ed between these evolutionary variables. Gene order conservation shows a mu ch lower variance than the cooccurrence of genomes in COGs, which indicates that intragenome homogenization via recombination occurs in evolution much faster than intergenome homogenization via horizontal gene transfer and li neage-specific gene loss. The potential of using template-anchored multiple -genome alignments for predicting functions of uncharacterized genes was qu antitatively assessed. Functions were predicted or significantly clarified for similar to 90 COGs (similar to4% of the total of 2414 analyzed COGs). T he most significant predictions were obtained for the poorly characterized archaeal genomes; these include a previously uncharacterized restriction-mo dification system, a nuclease-helicase combination implicated in DNA repair , and the probable archaeal counterpart of the eukaryotic exosome. Multiple genome alignments are a resource for studies on operon rearrangement and d isruption, which is central to our understanding of the evolution of prokar yotic genomes. Because of the rapid evolution of the gene order, the potent ial of genome alignment for prediction of gene functions is limited, but ne vertheless, such predictions information significantly complements the resu lts obtained through protein sequence and structure analysis.