ITA
ENG

Multi-class protein fold recognition using support vector machines and neural networks

Authors

Ding, CHQ Dubchak, I

Citation

Chq. Ding et I. Dubchak, Multi-class protein fold recognition using support vector machines and neural networks, BIOINFORMAT, 17(4), 2001, pp. 349-358

Citations number

Categorie Soggetti

Multidisciplinary

Journal title

BIOINFORMATICS

ISSN journal

13674803 → ACNP

Volume

Issue

Year of publication

2001

Pages

349 - 358

Database

ISI

SICI code

1367-4803(200104)17:4<349:MPFRUS>2.0.ZU;2-D

Abstract

Motivation: Protein fold recognition is an important approach to structure discovery without relying on sequence similarity. We study this approach wi th new multi-class classification methods and examined many issues importan t for a practical recognition system. Results: Most current discriminative methods for protein fold prediction us e the one-against-others method, which has the well-known 'False Positives' problem. We investigated two new methods: the unique one-against-others an d the all-against-all methods. Both improve prediction accuracy by 14-110% on a dataset containing 27 SCOP folds. We used the Support Vector Machine ( SVM) and the Neural Network (NN) learning methods as base classifiers. SVMs converges fast and leads to high accuracy. When scores of multiple paramet er datasets are combined, majority voting reduces noise and increases recog nition accuracy. We examined many issues involved with large number of clas ses, including dependencies of prediction accuracy on the number of folds a nd on the number of representatives in a fold. Overall, recognition systems achieve 56% fold prediction accuracy on a protein test dataset, where most of the proteins have below 25% sequence identity with the proteins used in training.