Multi-class protein fold recognition using support vector machines and neural networks

Citation
Chq. Ding et I. Dubchak, Multi-class protein fold recognition using support vector machines and neural networks, BIOINFORMAT, 17(4), 2001, pp. 349-358
Citations number
23
Categorie Soggetti
Multidisciplinary
Journal title
BIOINFORMATICS
ISSN journal
13674803 → ACNP
Volume
17
Issue
4
Year of publication
2001
Pages
349 - 358
Database
ISI
SICI code
1367-4803(200104)17:4<349:MPFRUS>2.0.ZU;2-D
Abstract
Motivation: Protein fold recognition is an important approach to structure discovery without relying on sequence similarity. We study this approach wi th new multi-class classification methods and examined many issues importan t for a practical recognition system. Results: Most current discriminative methods for protein fold prediction us e the one-against-others method, which has the well-known 'False Positives' problem. We investigated two new methods: the unique one-against-others an d the all-against-all methods. Both improve prediction accuracy by 14-110% on a dataset containing 27 SCOP folds. We used the Support Vector Machine ( SVM) and the Neural Network (NN) learning methods as base classifiers. SVMs converges fast and leads to high accuracy. When scores of multiple paramet er datasets are combined, majority voting reduces noise and increases recog nition accuracy. We examined many issues involved with large number of clas ses, including dependencies of prediction accuracy on the number of folds a nd on the number of representatives in a fold. Overall, recognition systems achieve 56% fold prediction accuracy on a protein test dataset, where most of the proteins have below 25% sequence identity with the proteins used in training.