A fully automatic evolutionary classification of protein folds: Dali Domain Dictionary version 3

Citation
S. Dietmann et al., A fully automatic evolutionary classification of protein folds: Dali Domain Dictionary version 3, NUCL ACID R, 29(1), 2001, pp. 55-57
Citations number
13
Categorie Soggetti
Biochemistry & Biophysics
Journal title
NUCLEIC ACIDS RESEARCH
ISSN journal
03051048 → ACNP
Volume
29
Issue
1
Year of publication
2001
Pages
55 - 57
Database
ISI
SICI code
0305-1048(20010101)29:1<55:AFAECO>2.0.ZU;2-X
Abstract
The Dali Domain Dictionary (http://www.ebi.ac.uk/ dali/domain) is a numeric al taxonomy of all known structures in the Protein Data Bank (PDB). The tax onomy is derived fully automatically from measurements of structural, funct ional and sequence similarities. Here, we report the extension of the class ification to match the traditional four hierarchical levels corresponding t o: (i) supersecondary structural motifs (attractors in fold space), (ii) th e topology of globular domains (fold types), (iii) remote homologues (funct ional families) and (iv) homologues with sequence identity above 25% (seque nce families). The computational definitions of attractors and functional f amilies are new. In September 2000, the Dali classification contained 10 53 1 PDB entries comprising 17 101 chains, which were partitioned into five at tractor regions, 1375 fold types, 2582 functional families and 3724 domain sequence families. Sequence families were further associated with 99 582 un ique homologous sequences in the HSSP database, which increases the number of effectively known structures several-fold. The resulting database contai ns the description of protein domain architecture, the definition of struct ural neighbours around each known structure, the definition of structurally conserved cores and a comprehensive library of explicit multiple alignment s of distantly related protein families.