Distribution of protein folds in the three superkingdoms of life

Citation
Yi. Wolf et al., Distribution of protein folds in the three superkingdoms of life, GENOME RES, 9(1), 1999, pp. 17-26
Citations number
49
Categorie Soggetti
Molecular Biology & Genetics
Journal title
GENOME RESEARCH
ISSN journal
10889051 → ACNP
Volume
9
Issue
1
Year of publication
1999
Pages
17 - 26
Database
ISI
SICI code
1054-9803(199901)9:1<17:DOPFIT>2.0.ZU;2-8
Abstract
A sensitive protein-fold recognition procedure was developed on the basis o f iterative database search using the PSI-BLAST program. A collection of 11 93 position-dependent weight matrices that can be used as fold identifiers was produced. In the completely sequenced genomes, folds could be automatic ally identified for 20%-30% of the proteins, with 3%-6% more detectable by additional analysis of conserved motifs. The distribution of the most commo n folds is very similar in bacteria and archaea but distinct in eukaryotes. Within the bacteria, this distribution differs between parasitic and free- living species. In all analyzed genomes, the P-loop NTPases are the most ab undant fold. In bacteria and archaea, the next most common folds are ferred oxin-like domains, TIM-barrels, and methyltransferases, whereas in eukaryot es, the second to fourth places belong to protein kinases, beta-propellers and TIM-barrels. The observed diversity of protein folds in different prote omes is approximately twice as high as it would be expected from a simple s tochastic model describing a proteome as a finite sample from an infinite p ool of proteins with an exponential distribution of the fold fractions. Dis tribution of the number of domains with different folds in one protein fits the geometric model, which is compatible with the evolution of multidomain proteins by random combination of domains.