A STRUCTURAL CENSUS OF THE CURRENT POPULATION OF PROTEIN SEQUENCES

Citation
M. Gerstein et M. Levitt, A STRUCTURAL CENSUS OF THE CURRENT POPULATION OF PROTEIN SEQUENCES, Proceedings of the National Academy of Sciences of the United Statesof America, 94(22), 1997, pp. 11911-11916
Citations number
53
Categorie Soggetti
Multidisciplinary Sciences
ISSN journal
00278424
Volume
94
Issue
22
Year of publication
1997
Pages
11911 - 11916
Database
ISI
SICI code
0027-8424(1997)94:22<11911:ASCOTC>2.0.ZU;2-G
Abstract
We examine the occurrence of the approximate to 300 known protein fold s in different groups of organisms. To do this, we characterize a larg e fraction of the currently known protein sequences (approximate to 14 0,000) in structural terms, by matching them to known structures via s equence comparison (or by secondary-structure class prediction for tho se without structural homologues). Overall, we find that an appreciabl e fraction of the known folds are present in each of the major groups of organisms (e.g., bacteria and eukaryotes share 156 of 275 folds), a nd most of the common folds are associated with many families of nonho mologous sequences (i.e., >10 sequence families for each common fold). However, different groups of organisms have characteristically distin ct distributions of folds, So, for instance, some of the most common f olds in vertebrates, such as globins or zinc fingers, are rare or abse nt in bacteria, Many of these differences in fold usage are biological ly reasonable, such as the folds of metabolic enzymes being common in bacteria and those associated with extracellular transport and communi cation being common in animals. They also have important implications for database-based methods for fold recognition, suggesting that an un known sequence from a plant is more likely to have a certain fold (e.g ., a TIM barrel) than an unknown sequence from an animal.