PRED-CLASS: Cascading neural networks for generalized protein classification and genome-wide applications

Citation
C. Pasquier et al., PRED-CLASS: Cascading neural networks for generalized protein classification and genome-wide applications, PROTEINS, 44(3), 2001, pp. 361-369
Citations number
35
Categorie Soggetti
Biochemistry & Biophysics
Journal title
PROTEINS-STRUCTURE FUNCTION AND GENETICS
ISSN journal
08873585 → ACNP
Volume
44
Issue
3
Year of publication
2001
Pages
361 - 369
Database
ISI
SICI code
0887-3585(20010815)44:3<361:PCNNFG>2.0.ZU;2-0
Abstract
A cascading system of hierarchical, artificial neural networks (named PRED- CLASS) is presented for the generalized classification of proteins into fou r distinct classes-transmembrane, fibrous, globular, and mixed-from informa tion solely encoded in their amino acid sequences. The architecture of the individual component networks is kept very simple, reducing the number of f ree parameters (network synaptic weights) for faster training, improved gen eralization, and the avoidance of data overfitting. Capturing information f rom as few as 50 protein sequences spread among the four target classes (6 transmembrane, 10 fibrous, 13 globular, and 17 mixed), PRED-CLASS was able to obtain 371 correct predictions out of a set of 387 proteins (success rat e similar to 96%) unambiguously assigned into one of the target classes. Th e application of PRED-CLASS to several test sets and complete proteomes of several organisms demonstrates that such a method could serve as a valuable tool in the annotation of genomic open reading frames with no functional a ssignment or as a preliminary step in fold recognition and ab initio struct ure prediction methods. Detailed results obtained for various data sets and completed genomes, along with a web sever running the PRED-CLASS algorithm , can be accessed over the World Wide Web at http://o2.biol.uoa.gr/PRED-CLA SS. (C) 2001 Wiley-Liss, Inc.