J. Boberg et al., ACCURATE PREDICTION OF PROTEIN SECONDARY STRUCTURAL CLASS WITH FUZZY STRUCTURAL VECTORS, Protein engineering, 8(6), 1995, pp. 505-512
The prerequisites for accurate prediction of protein secondary structu
ral class (all-alpha, all-beta, alpha+beta, alpha/beta or multidomain)
were studied, and a new similarity-based method is presented for the
prediction of the secondary structural class of a protein from its seq
uence. The new method uses representatives of nuclear families as a le
arning set. For the sequence to be predicted, the method produces a ve
ctor of certainty factors called a fuzzy structural vector, Validation
with independent test sets shows that the prediction accuracy of the
proposed method has clear dependency on the representativity of the le
arning set. The representatives obtained from the nuclear families of
the Brookhaven Protein Data Bank (PDB) were shown to give accurate pre
dictions for PDB proteins, whilst the amino acid composition-based met
hods used previously achieve their maximum predictability with relativ
ely limited learning sets, and they remain inaccurate even with highly
representative learning sets. The usability of the new method is incr
eased further by the fuzzy structural vectors, which substantially red
uce the risk of misclassification and realistically describe vague sec
ondary structural tendencies.