Evaluation and improvement of multiple sequence methods for protein secondary structure prediction

Citation
Ja. Cuff et Gj. Barton, Evaluation and improvement of multiple sequence methods for protein secondary structure prediction, PROTEINS, 34(4), 1999, pp. 508-519
Citations number
83
Categorie Soggetti
Biochemistry & Biophysics
Journal title
PROTEINS-STRUCTURE FUNCTION AND GENETICS
ISSN journal
08873585 → ACNP
Volume
34
Issue
4
Year of publication
1999
Pages
508 - 519
Database
ISI
SICI code
0887-3585(19990301)34:4<508:EAIOMS>2.0.ZU;2-U
Abstract
A new dataset of 396 protein domains is developed and used to evaluate the performance of the protein secondary structure prediction algorithms DSC, P BD, NNSSP, and PREDATOR, The maximum theoretical Q(3) accuracy for combinat ion of these, methods is shown to be 78%. A simple consensus prediction on the 396 domains, with automatically generated multiple sequence alignments gives an average Q(3) prediction accuracy of 72.9%. This is a 1% improvemen t over PHD, which was the best single method evaluated. Segment Overlap Acc uracy (SOV) is 75.4% for the consensus method on the 396-protein set. The s econdary structure definition method DSSP defines 8 states, but these are r educed by most authors to 3 for prediction. Application of the different pu blished 8- to 3-state reduction methods shows variation of over 3% on appar ent prediction accuracy. This suggests that care should be taken to compare methods by the same reduction method. Two new sequence datasets (CB513 and CB251) are derived which are suitable for cross-validation of secondary st ructure prediction methods without artifacts due to internal homology. A fu lly automatic World Wide Web service that predicts protein secondary struct ure by a combination of methods is available via http://barton.ebi.ac.uk/. Proteins 1999;34:508-519. (C) 1999 Wiley-Liss, Inc.