In order to extract the maximum amount of information from the rapidly
accumulating genome sequences, all conserved genes need to be classif
ied according to their homologous relationships. Comparison of protein
s encoded in seven complete genomes from five major phylogenetic linea
ges and elucidation of consistent patterns of sequence similarities al
lowed the delineation of 720 clusters of orthologous groups (COGs). Ea
ch COG consists of individual orthologous proteins or orthologous sets
of paralogs from at least three lineages. Orthologs typically have th
e same function, allowing transfer of functional information from one
member to an entire COG. This relation automatically yields a number o
f functional predictions for poorly characterized genomes. The COGs co
mprise a framework for functional and evolutionary genome analysis.