Rl. Tatusov et al., The COG database: new developments in phylogenetic classification of proteins from complete genomes, NUCL ACID R, 29(1), 2001, pp. 22-28
The database of Clusters of Orthologous Groups of proteins (COGs), which re
presents an attempt on a phylogenetic classification of the proteins encode
d in complete genomes, currently consists of 2791 COGs including 45 350 pro
teins from 30 genomes of bacteria, archaea and the yeast Saccharomyces cere
visiae (http://www.ncbi.nlm.nih,gov/COG). In addition, a supplement to the
COGs is available, in which proteins encoded in the genomes of two multicel
lular eukaryotes, the nematode Caenorhabditis elegans and the fruit fly Dro
sophila melanogaster, and shared with bacteria and/or archaea were included
. The new features added to the COG database include information pages with
structural and functional details on each COG and literature references, i
mprovements of the COGNITOR program that is used to fit new proteins into t
he COGs, and classification of genomes and COGs constructed by using princi
pal component analysis.