ITA
ENG

How good is prediction of protein structural class by the component-coupled method?

Authors

Wang, ZX Yuan, Z

Citation

Zx. Wang et Z. Yuan, How good is prediction of protein structural class by the component-coupled method?, PROTEINS, 38(2), 2000, pp. 165-175

Citations number

Categorie Soggetti

Biochemistry & Biophysics

Journal title

PROTEINS-STRUCTURE FUNCTION AND GENETICS

ISSN journal

08873585 → ACNP

Volume

Issue

Year of publication

2000

Pages

165 - 175

Database

ISI

SICI code

0887-3585(20000201)38:2<165:HGIPOP>2.0.ZU;2-S

Abstract

Proteins of known structures are usually classified into four structural cl asses: all-a, all-beta, alpha+beta, and alpha/beta type of proteins. A numb er of methods to predicting the structural class of a protein based on its amino acid composition have been developed during the past few years. Recen tly, a component-coupled method was developed for predicting protein struct ural class according to amino acid composition. This method is based on the least Mahalanobis distance principle, and yields much better predicted res ults in comparison with the previous methods. However, the success rates re ported for structural class prediction by different investigators are contr adictory. The highest reported accuracies by this method are near 100%, but the lowest one is only about 60%. The goal of this study is to resolve thi s paradox and to determine the possible upper limit of prediction rate for structural classes. In this paper, based on the normality assumption and th e Bayes decision rule for minimum error, a new method is proposed for predi cting the structural class of a protein according to its amino acid composi tion. The detailed theoretical analysis indicates that if the four protein folding classes are governed by the normal distributions, the present metho d will yield the optimum predictive result in a statistical sense. A non-re dundant data set of 1,189 protein domains is used to evaluate the performan ce of the new method, Our results demonstrate that 60% correctness is the u pper limit for a 4-type class prediction from amino acid composition alone for an unknown query protein. The apparent relatively high accuracy level ( more than 90%) attained in the previous studies was due to the preselection of test sets, which may not be adequately representative of all unrelated proteins. Proteins 2000;38:165-175, (C) 2000 Wiley-Liss, Inc.