We introduce a computational method for identifying subcellular locations o
f proteins from the phylogenetic distribution of the homologs of organellar
proteins. This method is based on the observation that proteins localized
to a given organelle by experiments tend to share a characteristic phylogen
etic distribution of their homologs-a phylogenetic profile. Therefore any o
ther protein can be localized by its phylogenetic profile. Application of t
his method to mitochondrial proteins reveals that nucleus-encoded proteins
previously known to be destined for mitochondria fall into three groups: pr
okaryote-derived, eukaryote-derived, and organism-specific (i,e,, found onl
y in the organism under study). Prokaryote-derived mitochondrial proteins c
an be identified effectively by their phylogenetic profiles. In the yeast S
accharomyces cerevisiae, 361 nucleus-encoded mitochondrial proteins can be
identified at 50% accuracy with 58% coverage. From these values and the pro
portion of conserved mitochondrial genes, it can be inferred that approxima
te to 630 genes, or 10% of the nuclear genome, is devoted to mitochondrial
function. In the worm Caenorhabditis elegans, we estimate that there are ap
proximate to 660 nucleus-encoded mitochondrial genes, or 4% of its genome,
with approximate to 400 of these genes contributed from the prokaryotic mit
ochondrial ancestor. The large fraction of organism-specific and eukaryote-
derived genes suggests that mitochondria perform specialized roles absent f
rom prokaryotic mitochondrial ancestors. We observe measurably distinct phy
logenetic profiles among proteins from different subcellular compartments,
allowing the general use of prokaryotic genomes in learning features of euk
aryotic proteins.