COMPARISON OF ARCHAEAL AND BACTERIAL GENOMES - COMPUTER-ANALYSIS OF PROTEIN SEQUENCES PREDICTS NOVEL FUNCTIONS AND SUGGESTS A CHIMERIC ORIGIN FOR THE ARCHAEA

Citation
Ev. Koonin et al., COMPARISON OF ARCHAEAL AND BACTERIAL GENOMES - COMPUTER-ANALYSIS OF PROTEIN SEQUENCES PREDICTS NOVEL FUNCTIONS AND SUGGESTS A CHIMERIC ORIGIN FOR THE ARCHAEA, Molecular microbiology, 25(4), 1997, pp. 619-637
Citations number
80
Categorie Soggetti
Biology,Microbiology
Journal title
ISSN journal
0950382X
Volume
25
Issue
4
Year of publication
1997
Pages
619 - 637
Database
ISI
SICI code
0950-382X(1997)25:4<619:COAABG>2.0.ZU;2-#
Abstract
Protein sequences encoded in three complete bacterial genomes, those o f Haemophilus influenzae, Mycoplasma genitalium and Synechocystis sp., and the first available archaeal genome sequence, that of Methanococc us jannaschii, were analysed using the BLAST2 algorithm and methods fo r amino acid motif detection. Between 75% and 90% of the predicted pro teins encoded in each of the bacterial genomes and 73% of the M, janna schii proteins showed significant sequence similarity to proteins from other species. The fraction of bacterial and archaeal proteins contai ning regions conserved over long phylogenetic distances is nearly the same and close to 70%, Functions of 70-85% of the bacterial proteins a nd about 70% of the archaeal proteins were predicted with varying prec ision. This contrasts with the previous report that more than half of the archaeal proteins have no homologues and shows that, with more sen sitive methods and detailed analysis of conserved motifs, archaeal gen omes become as amenable to meaningful interpretation by computer as ba cterial genomes. The analysis of conserved motifs resulted in the pred iction of a number of previously undetected functions of bacterial and archaeal proteins and in the identification of novel protein families . In spite of the generally high conservation of protein sequences, or thologues of 25% or less of the IM. jannaschii genes were detected in each individual completely sequenced genome, supporting the uniqueness of archaea as a distinct domain of life. About 53% of the M. jannasch ii proteins belong to families of paralogues, a fraction similar to th at in bacteria with larger genomes, such as Synechocystis sp, and Esch erichia coil, but higher than that in H. influenzae, which has approxi mately the same number of genes as M. jannaschii. Certain groups of pr oteins, e.g. molecular chaperones and DNA repair enzymes, thought to b e ubiquitous and represented in the minimal gene set derived by bacter ial genome comparison, are missing in M. jannaschii, indicating massiv e non-orthologous displacement of genes responsible for essential func tions. An unexpectedly large fraction of the nn. jannaschii gene produ cts, 44%, shows significantly higher similarity to bacterial than to e ukaryotic proteins, compared with 13% that have eukaryotic proteins as their closest homologues (the rest of the proteins show approximately the same level of similarity to bacterial and eukaryotic homologues o r have no homologues), Proteins involved in translation, transcription , replication and protein secretion are most closely related to eukary otic proteins, whereas metabolic enzymes, metabolite uptake systems, e nzymes for cell wall biosynthesis and many uncharacterized proteins ap pear to be 'bacterial'. A similar prevalence of proteins of apparent b acterial origin was observed among the currently available sequences f rom the distantly related archaeal genus, Sulfolobus. It is likely tha t the evolution of archaea included at least one major merger between ancestral cells from the bacterial lineage and the lineage leading to the eukaryotic nucleocytoplasm.