Protein structural domains: analysis of the 3Dee domains database

Citation
U. Dengler et al., Protein structural domains: analysis of the 3Dee domains database, PROTEINS, 42(3), 2001, pp. 332-344
Citations number
67
Categorie Soggetti
Biochemistry & Biophysics
Journal title
PROTEINS-STRUCTURE FUNCTION AND GENETICS
ISSN journal
08873585 → ACNP
Volume
42
Issue
3
Year of publication
2001
Pages
332 - 344
Database
ISI
SICI code
0887-3585(20010215)42:3<332:PSDAOT>2.0.ZU;2-I
Abstract
The 3Dee database of domain definitions was developed as a comprehensive co llection of domain definitions for all three-dimensional structures in the Protein Data Bank (PDB), The database includes definitions for complex, mul tiple-segment and multiple-chain domains as well as simple sequential domai ns, organized in a structural hierarchy. Two different snapshots of the 3De e database were analyzed at September 1996 and November 1999, For the Novem ber 1999 release, 7,995 PDB entries contained 13,767 protein chains and gav e rise to 18,896 domains. The domain sequences clustered into 1,715 domain sequence families, which were further clustered into a conservative 1,199 d omain structure families (families with similar folds). The proportion of d ifferent domain structure families per domain sequence family increases fro m 84% for domains 1-100 residues long to 100% for domains greater than 600 residues. This is in keeping with the idea that longer chains will have mor e alternative folds available to them, Of the representative domains from t he domain sequence families, 49% are in the range of 51-150 residues, where as 64% of the representative chains over 200 residues have more than 1 doma in. Of the representative chains, 8.5% are part of multichain domains. The largest multichain domain in the database has 14 chains and 1,400 residues, whereas the largest single-chain domain has 907 residues. The largest numb er of domains found in a protein is 13, The analysis shows that over the hi story of the PDB, new domain folds have been discovered at a slower rate th an by random selection of all known folds. Between 1992 and 1997, a constan t 1 in 11 new domains deposited in the PDB has shown no sequence similarity to a previously known domain sequence family, and only 1 in 15 new domain structures has had a fold that has not been seen previously. A comparison o f the September 1996 release of 3Dee to the Structural Classification of Pr oteins (SCOP) showed that the domain definitions agreed for 80% of the repr esentative protein chains. However, 3Dee provided explicit domain boundarie s for more proteins, 3Dee is accessible on the World Wide Web at http://bar ton.ebi.ac.uk/servers/3Dee.html. Proteins 2001; 42:332-344. (C) 2000 Wiley- Liss, Inc.