ITA
ENG

A rapid classification protocol for the CATH Domain Database to support structural genomics

Authors

Pearl, FMG Martin, N Bray, JE Buchan, DWA Harrison, AP Lee, D Reeves, GA Shepherd, AJ Sillitoe, I Todd, AE Thornton, JM Orengo, CA

Citation

Fmg. Pearl et al., A rapid classification protocol for the CATH Domain Database to support structural genomics, NUCL ACID R, 29(1), 2001, pp. 223-227

Citations number

Categorie Soggetti

Biochemistry & Biophysics

Journal title

NUCLEIC ACIDS RESEARCH

ISSN journal

03051048 → ACNP

Volume

Issue

Year of publication

2001

Pages

223 - 227

Database

ISI

SICI code

0305-1048(20010101)29:1<223:ARCPFT>2.0.ZU;2-X

Abstract

In order to support the structural genomic initiatives, both by rapidly cla ssifying newly determined structures and by suggesting suitable targets for structure determination, we have recently developed several new protocols for classifying structures in the CATH domain database (http://www.biochem. ucl.ac.uk/ bsm/cath). These aim to increase the speed of classification of new structures using fast algorithms for structure comparison (GRATH) and t o improve the sensitivity in recognising distant structural relatives by in corporating sequence information from relatives in the genomes (DomainFinde r). In order to ensure the integrity of the database given the expected inc rease in data, the CATH Protein Family Database (CATH-PFDB), which currentl y includes 25 320 structural domains and a further 160 000 sequence relativ es has now been installed in a relational ORACLE database. This was essenti al for developing more rigorous validation procedures and for allowing effi cient querying of the database, particularly for genome analysis. The assoc iated Dictionary of Homologous Superfamilies [Bray,J.E, Todd,A.E., Pearl,F. M.G., Thornton,J.M. and Orengo,C.A. (2000) Protein Eng., 13, 153-165], whic h provides multiple structural alignments and functional information to ass ist in assigning new relatives, has also been expanded recently and now inc ludes information for 903 homologous superfamilies. In order to improve cov erage of known structures, preliminary classification levels are now provid ed for new structures at interim stages in the classification protocol. Sin ce a large proportion of new structures can be rapidly classified using pro file-based sequence analysis [e.g. PSI-BLAST: AltschuI,S.F., Madden,T.L., S chaffer,A.A., Zhang,J., Zhang,Z., Miller,W. and Lipman,D.J. (1997) Nucleic Acids Res., 25, 3389-3402], this provides preliminary classification for ea sily recognisable homologues, which in the latest release of CATH (version 1.7) represented nearly three-quarters of the non-identical structures.