ITA
ENG

DICTIONARY OF RECURRENT DOMAINS IN PROTEIN STRUCTURES

Authors

HOLM L SANDER C

Citation

L. Holm et C. Sander, DICTIONARY OF RECURRENT DOMAINS IN PROTEIN STRUCTURES, Proteins, 33(1), 1998, pp. 88-96

Citations number

Categorie Soggetti

Biology,"Genetics & Heredity

Journal title

Proteins → ACNP

ISSN journal

08873585

Volume

Issue

Year of publication

1998

Pages

88 - 96

Database

ISI

SICI code

0887-3585(1998)33:1<88:DORDIP>2.0.ZU;2-F

Abstract

The rapid growth in the number of experimentally determined three-dime nsional protein structures has sharpened the need for comprehensive an d up-to-date surveys of known structures. Classic work on protein stru cture classification has made it clear that a structural survey is bes t carried out at the level of domains, i.e., substructures that recur in evolution as functional units in different protein contexts. We pre sent a method for automated domain identification from protein structu re atomic coordinates based on quantitative measures of compactness an d, as the new element, recurrence. Compactness criteria are used to re cursively divide a protein into a series of successively smaller and s maller substructures. Recurrence criteria are used to select an optima l size level of these substructures, so that many of the chosen substr uctures are common to different proteins at a high level of statistica l significance. The joint application of these criteria automatically yields consistent domain definitions between remote homologs, a result difficult to achieve using compactness criteria alone. The method is applied to a representative set of 1,137 sequence-unique protein famil ies covering 6,500 known structures. Clustering of the resulting set o f domains (substructures) yields 594 distinct fold classes (types of s ubstructures). The Dali Domain Dictionary (http://www.embl-ebi.ac.uk/d ali/) not only provides a global structural classification, but also a comprehensive description of families of protein sequences grouped ar ound representative proteins of known structure. The classification wi ll be continuously updated and can serve as a basis for improving our understanding of protein evolution and function and for evolving optim al strategies to complete the map of all natural protein structures. ( C) 1998 Wiley-liss, Inc.