Ja. Cuff et Gj. Barton, Evaluation and improvement of multiple sequence methods for protein secondary structure prediction, PROTEINS, 34(4), 1999, pp. 508-519
A new dataset of 396 protein domains is developed and used to evaluate the
performance of the protein secondary structure prediction algorithms DSC, P
BD, NNSSP, and PREDATOR, The maximum theoretical Q(3) accuracy for combinat
ion of these, methods is shown to be 78%. A simple consensus prediction on
the 396 domains, with automatically generated multiple sequence alignments
gives an average Q(3) prediction accuracy of 72.9%. This is a 1% improvemen
t over PHD, which was the best single method evaluated. Segment Overlap Acc
uracy (SOV) is 75.4% for the consensus method on the 396-protein set. The s
econdary structure definition method DSSP defines 8 states, but these are r
educed by most authors to 3 for prediction. Application of the different pu
blished 8- to 3-state reduction methods shows variation of over 3% on appar
ent prediction accuracy. This suggests that care should be taken to compare
methods by the same reduction method. Two new sequence datasets (CB513 and
CB251) are derived which are suitable for cross-validation of secondary st
ructure prediction methods without artifacts due to internal homology. A fu
lly automatic World Wide Web service that predicts protein secondary struct
ure by a combination of methods is available via http://barton.ebi.ac.uk/.
Proteins 1999;34:508-519. (C) 1999 Wiley-Liss, Inc.