Background: Several methods of structural classification have been develope
d to introduce some order to the large amount of data present in the Protei
n Data Bank. Such methods facilitate structural comparisons and provide a g
reater understanding of structure and function. The most widely used and co
mprehensive databases are SCOP, OATH and FSSP, which represent three unique
methods of classifying protein structures: purely manual, a combination of
manual and automated, and purely automated, respectively. In order to deve
lop reliable template libraries and benchmarks for protein-fold recognition
, a systematic comparison of these databases has been carried out to determ
ine their overall agreement in classifying protein structures.
Results: Approximately two-thirds of the protein chains in each database ar
e common to all three databases. Despite employing different methods, and b
asing their systems on different rules of protein structure and taxonomy, S
COP, CATH and FSSP agree on the majority of their classifications. Discrepa
ncies and inconsistencies are accounted for by a small number of explanatio
ns. Other interesting features have been identified, and Various difference
s between manual and automatic classification methods are presented.
Conclusions: Using these databases requires an understanding of the rules u
pon which they are based; each method offers certain advantages depending o
n the biological requirements and knowledge of the user. The degree of disc
repancy between the systems also has an impact on reliability of prediction
methods that employ these schemes as benchmarks. To generate accurate fold
templates for threading, we extract information from a consensus database,
encompassing agreements between SCOP, CATH and FSSP.