C. Pasquier et al., PRED-CLASS: Cascading neural networks for generalized protein classification and genome-wide applications, PROTEINS, 44(3), 2001, pp. 361-369
A cascading system of hierarchical, artificial neural networks (named PRED-
CLASS) is presented for the generalized classification of proteins into fou
r distinct classes-transmembrane, fibrous, globular, and mixed-from informa
tion solely encoded in their amino acid sequences. The architecture of the
individual component networks is kept very simple, reducing the number of f
ree parameters (network synaptic weights) for faster training, improved gen
eralization, and the avoidance of data overfitting. Capturing information f
rom as few as 50 protein sequences spread among the four target classes (6
transmembrane, 10 fibrous, 13 globular, and 17 mixed), PRED-CLASS was able
to obtain 371 correct predictions out of a set of 387 proteins (success rat
e similar to 96%) unambiguously assigned into one of the target classes. Th
e application of PRED-CLASS to several test sets and complete proteomes of
several organisms demonstrates that such a method could serve as a valuable
tool in the annotation of genomic open reading frames with no functional a
ssignment or as a preliminary step in fold recognition and ab initio struct
ure prediction methods. Detailed results obtained for various data sets and
completed genomes, along with a web sever running the PRED-CLASS algorithm
, can be accessed over the World Wide Web at http://o2.biol.uoa.gr/PRED-CLA
SS. (C) 2001 Wiley-Liss, Inc.