J. Lin et M. Gerstein, Whole-genome trees based on the occurrence of folds and orthologs: Implications for comparing genomes on different levels, GENOME RES, 10(6), 2000, pp. 808-818
dWe built whole-genome trees based on the presence or absence of particular
molecular features, either orthologs or folds, in the genomes of a number
of recently sequenced microorganisms. To put these genomic trees into persp
ective, we compared them to the traditional ribosomal phylogeny and also to
trees based on the sequence similarity of individual orthologous proteins.
We found that our genomic trees based on the overall occurrence of ortholo
gs did not agree well with the traditional tree. This discrepancy, however,
vanished when one restricted the tree to proteins involved in transcriptio
n and translation, not including problematic proteins involved in metabolis
m. Protein Folds unite superficially unrelated sequence families and repres
ent a most fundamental molecular unit described by genomes. We found that o
ur genomic occurrence tree based on folds agreed fairly well with the tradi
tional ribosomal phylogeny. Surprisingly, despite this overall agreement, c
ertain classes of folds, particularly all-beta ones, had a somewhat differe
nt phylogenetic distribution. We also compared our occurrence trees to whol
e-genome clusters based on the composition of amino acids and dl-nucleotide
s. Finally, we analyzed some technical aspects of genomic trees-e.g., compa
ring parsimony versus distance-based approaches and examining the effects o
f increasing numbers of organisms. Additional information (e.g. clickable t
rees) is available from http://bioinfo.mbb.yale.edu/genome/trees.