ITA
ENG

SEQUENCE SIMILARITY ANALYSIS OF ESCHERICHIA-COLI PROTEINS - FUNCTIONAL AND EVOLUTIONARY IMPLICATIONS

Authors

KOONIN EV TATUSOV RL RUDD KE

Citation

Ev. Koonin et al., SEQUENCE SIMILARITY ANALYSIS OF ESCHERICHIA-COLI PROTEINS - FUNCTIONAL AND EVOLUTIONARY IMPLICATIONS, Proceedings of the National Academy of Sciences of the United Statesof America, 92(25), 1995, pp. 11921-11925

Citations number

Categorie Soggetti

Multidisciplinary Sciences

Journal title

Proceedings of the National Academy of Sciences of the United Statesof America → ACNP

ISSN journal

00278424

Volume

Issue

Year of publication

1995

Pages

11921 - 11925

Database

ISI

SICI code

0027-8424(1995)92:25<11921:SSAOEP>2.0.ZU;2-F

Abstract

A computer analysis of 2328 protein sequences comprising about 60% of the Escherichia coli gene products was performed using methods for dat abase screening with individual sequences and alignment blocks. A high fraction of E. coli proteins-86%-shows significant sequence similarit y to other proteins in current databases; about 70% show conservation at least at the level of distantly related bacteria, and about 40% con tain ancient conserved regions (ACRs) shared with eukaryotic or Archae al proteins. For >90% of the E. coli proteins, either functional infor mation or sequence similarity, or both, are available. Forty-six perce nt of the E. coli proteins belong to 299 clusters of paralogs (intrasp ecies homologs) defined on the basis of pairwise similarity. Another 1 0% could be included in 70 superclusters using motif detection methods . The majority of the clusters contain only two to four members. In co ntrast, nearly 25% of all E. coli proteins belong to the four largest superclusters-namely, permeases, ATPases and GTPases with the conserve d ''Walker-type'' motif, helix-turn-helix regulatory proteins, and NAD (FAD)-binding proteins. We conclude that bacterial protein sequences g enerally are highly conserved in evolution, with about 50% of all ACR- containiug protein families represented among the E. coli gene product s. With the current sequence databases and methods of their screening, computer analysis yields useful information on the functions and evol utionary relationships of the vast majority of genes in a bacterial ge nome. Sequence similarity with E. coli proteins allows the prediction of functions for a number of important eukaryotic genes, including Sev eral whose products are implicated in human diseases.