S. Dietmann et al., A fully automatic evolutionary classification of protein folds: Dali Domain Dictionary version 3, NUCL ACID R, 29(1), 2001, pp. 55-57
The Dali Domain Dictionary (http://www.ebi.ac.uk/ dali/domain) is a numeric
al taxonomy of all known structures in the Protein Data Bank (PDB). The tax
onomy is derived fully automatically from measurements of structural, funct
ional and sequence similarities. Here, we report the extension of the class
ification to match the traditional four hierarchical levels corresponding t
o: (i) supersecondary structural motifs (attractors in fold space), (ii) th
e topology of globular domains (fold types), (iii) remote homologues (funct
ional families) and (iv) homologues with sequence identity above 25% (seque
nce families). The computational definitions of attractors and functional f
amilies are new. In September 2000, the Dali classification contained 10 53
1 PDB entries comprising 17 101 chains, which were partitioned into five at
tractor regions, 1375 fold types, 2582 functional families and 3724 domain
sequence families. Sequence families were further associated with 99 582 un
ique homologous sequences in the HSSP database, which increases the number
of effectively known structures several-fold. The resulting database contai
ns the description of protein domain architecture, the definition of struct
ural neighbours around each known structure, the definition of structurally
conserved cores and a comprehensive library of explicit multiple alignment
s of distantly related protein families.