Motivation: Protein fold recognition is an important approach to structure
discovery without relying on sequence similarity. We study this approach wi
th new multi-class classification methods and examined many issues importan
t for a practical recognition system.
Results: Most current discriminative methods for protein fold prediction us
e the one-against-others method, which has the well-known 'False Positives'
problem. We investigated two new methods: the unique one-against-others an
d the all-against-all methods. Both improve prediction accuracy by 14-110%
on a dataset containing 27 SCOP folds. We used the Support Vector Machine (
SVM) and the Neural Network (NN) learning methods as base classifiers. SVMs
converges fast and leads to high accuracy. When scores of multiple paramet
er datasets are combined, majority voting reduces noise and increases recog
nition accuracy. We examined many issues involved with large number of clas
ses, including dependencies of prediction accuracy on the number of folds a
nd on the number of representatives in a fold. Overall, recognition systems
achieve 56% fold prediction accuracy on a protein test dataset, where most
of the proteins have below 25% sequence identity with the proteins used in
training.