COMPARISON OF ARCHAEAL AND BACTERIAL GENOMES - COMPUTER-ANALYSIS OF PROTEIN SEQUENCES PREDICTS NOVEL FUNCTIONS AND SUGGESTS A CHIMERIC ORIGIN FOR THE ARCHAEA
Ev. Koonin et al., COMPARISON OF ARCHAEAL AND BACTERIAL GENOMES - COMPUTER-ANALYSIS OF PROTEIN SEQUENCES PREDICTS NOVEL FUNCTIONS AND SUGGESTS A CHIMERIC ORIGIN FOR THE ARCHAEA, Molecular microbiology, 25(4), 1997, pp. 619-637
Protein sequences encoded in three complete bacterial genomes, those o
f Haemophilus influenzae, Mycoplasma genitalium and Synechocystis sp.,
and the first available archaeal genome sequence, that of Methanococc
us jannaschii, were analysed using the BLAST2 algorithm and methods fo
r amino acid motif detection. Between 75% and 90% of the predicted pro
teins encoded in each of the bacterial genomes and 73% of the M, janna
schii proteins showed significant sequence similarity to proteins from
other species. The fraction of bacterial and archaeal proteins contai
ning regions conserved over long phylogenetic distances is nearly the
same and close to 70%, Functions of 70-85% of the bacterial proteins a
nd about 70% of the archaeal proteins were predicted with varying prec
ision. This contrasts with the previous report that more than half of
the archaeal proteins have no homologues and shows that, with more sen
sitive methods and detailed analysis of conserved motifs, archaeal gen
omes become as amenable to meaningful interpretation by computer as ba
cterial genomes. The analysis of conserved motifs resulted in the pred
iction of a number of previously undetected functions of bacterial and
archaeal proteins and in the identification of novel protein families
. In spite of the generally high conservation of protein sequences, or
thologues of 25% or less of the IM. jannaschii genes were detected in
each individual completely sequenced genome, supporting the uniqueness
of archaea as a distinct domain of life. About 53% of the M. jannasch
ii proteins belong to families of paralogues, a fraction similar to th
at in bacteria with larger genomes, such as Synechocystis sp, and Esch
erichia coil, but higher than that in H. influenzae, which has approxi
mately the same number of genes as M. jannaschii. Certain groups of pr
oteins, e.g. molecular chaperones and DNA repair enzymes, thought to b
e ubiquitous and represented in the minimal gene set derived by bacter
ial genome comparison, are missing in M. jannaschii, indicating massiv
e non-orthologous displacement of genes responsible for essential func
tions. An unexpectedly large fraction of the nn. jannaschii gene produ
cts, 44%, shows significantly higher similarity to bacterial than to e
ukaryotic proteins, compared with 13% that have eukaryotic proteins as
their closest homologues (the rest of the proteins show approximately
the same level of similarity to bacterial and eukaryotic homologues o
r have no homologues), Proteins involved in translation, transcription
, replication and protein secretion are most closely related to eukary
otic proteins, whereas metabolic enzymes, metabolite uptake systems, e
nzymes for cell wall biosynthesis and many uncharacterized proteins ap
pear to be 'bacterial'. A similar prevalence of proteins of apparent b
acterial origin was observed among the currently available sequences f
rom the distantly related archaeal genus, Sulfolobus. It is likely tha
t the evolution of archaea included at least one major merger between
ancestral cells from the bacterial lineage and the lineage leading to
the eukaryotic nucleocytoplasm.