DICTIONARY OF RECURRENT DOMAINS IN PROTEIN STRUCTURES

Authors
Citation
L. Holm et C. Sander, DICTIONARY OF RECURRENT DOMAINS IN PROTEIN STRUCTURES, Proteins, 33(1), 1998, pp. 88-96
Citations number
27
Categorie Soggetti
Biology,"Genetics & Heredity
Journal title
ISSN journal
08873585
Volume
33
Issue
1
Year of publication
1998
Pages
88 - 96
Database
ISI
SICI code
0887-3585(1998)33:1<88:DORDIP>2.0.ZU;2-F
Abstract
The rapid growth in the number of experimentally determined three-dime nsional protein structures has sharpened the need for comprehensive an d up-to-date surveys of known structures. Classic work on protein stru cture classification has made it clear that a structural survey is bes t carried out at the level of domains, i.e., substructures that recur in evolution as functional units in different protein contexts. We pre sent a method for automated domain identification from protein structu re atomic coordinates based on quantitative measures of compactness an d, as the new element, recurrence. Compactness criteria are used to re cursively divide a protein into a series of successively smaller and s maller substructures. Recurrence criteria are used to select an optima l size level of these substructures, so that many of the chosen substr uctures are common to different proteins at a high level of statistica l significance. The joint application of these criteria automatically yields consistent domain definitions between remote homologs, a result difficult to achieve using compactness criteria alone. The method is applied to a representative set of 1,137 sequence-unique protein famil ies covering 6,500 known structures. Clustering of the resulting set o f domains (substructures) yields 594 distinct fold classes (types of s ubstructures). The Dali Domain Dictionary (http://www.embl-ebi.ac.uk/d ali/) not only provides a global structural classification, but also a comprehensive description of families of protein sequences grouped ar ound representative proteins of known structure. The classification wi ll be continuously updated and can serve as a basis for improving our understanding of protein evolution and function and for evolving optim al strategies to complete the map of all natural protein structures. ( C) 1998 Wiley-liss, Inc.