Ar. Mushegian et Ev. Koonin, A MINIMAL GENE SET FOR CELLULAR LIFE DERIVED BY COMPARISON OF COMPLETE BACTERIAL GENOMES, Proceedings of the National Academy of Sciences of the United Statesof America, 93(19), 1996, pp. 10268-10273
The recently sequenced genome of the parasitic bacterium Mycoplasma ge
nitalium contains only 468 identified protein-coding genes that have b
een dubbed a minimal gene complement [Fraser, C. M., Gocayne, J. D., W
hite, O., Adams, M. D., Clayton, R. A., et al. (1995) Science 270, 397
-403]. Although the M. genitalium gene complement is indeed the smalle
st among known cellular life forms, there is no evidence that it is th
e minimal self-sufficient gene set, To derive such a set, we compared
the 468 predicted M. genitalium protein sequences with the 1703 protei
n sequences encoded by the other completely sequenced small bacterial
genome, that of Haemophilus influenzae. M. genitalium and H. influenza
e belong to two ancient bacterial lineages, i.e., Gram-positive and Gr
am-negative bacteria, respectively. Therefore, the genes that are cons
erved in these two bacteria are almost certainly essential for cellula
r function. It is this category of genes that is most likely to approx
imate the minimal gene set. We found that 240 M. genitalium genes have
orthologs among the genes of H. influenzae. This collection of genes
falls short of comprising the minimal set as some enzymes responsible
for intermediate steps in essential pathways are missing. The apparent
reason for this is the phenomenon that we call nonorthologous gene di
splacement when the same function is fulfilled by nonorthologous prote
ins in two organisms. We identified 22 nonorthologous displacements an
d supplemented the set of orthologs with the respective M. genitalium
genes. After examining the resulting List of 262 genes for possible fu
nctional redundancy and for the presence of apparently parasite-specif
ic genes, 6 genes were removed. We suggest that the remaining 256 gene
s are close to the minimal gene set that is necessary and sufficient t
o sustain the existence of a modern-type cell. Most of the proteins en
coded by the genes from the minimal set have eukaryotic or archaeal ho
mologs but seven key proteins of DNA replication do not. We speculate
that the last common ancestor of the three primary kingdoms had an RNA
genome. Possibilities are explored to further reduce the minimal set
to model a primitive cell that might have existed at a very early stag
e of life evolution.