ITA
ENG

NEURAL NETWORKS FOR FULL-SCALE PROTEIN-SEQUENCE CLASSIFICATION - SEQUENCE ENCODING WITH SINGULAR-VALUE DECOMPOSITION

Authors

WU C BERRY M SHIVAKUMAR S MCLARTY J

Citation

C. Wu et al., NEURAL NETWORKS FOR FULL-SCALE PROTEIN-SEQUENCE CLASSIFICATION - SEQUENCE ENCODING WITH SINGULAR-VALUE DECOMPOSITION, Machine learning, 21(1-2), 1995, pp. 177-193

Citations number

Categorie Soggetti

Computer Sciences","Computer Science Artificial Intelligence",Neurosciences

Journal title

Machine learning → ACNP

ISSN journal

08856125

Volume

Issue

1-2

Year of publication

1995

Pages

177 - 193

Database

ISI

SICI code

0885-6125(1995)21:1-2<177:NNFFPC>2.0.ZU;2-K

Abstract

A neural network classification method has been developed as an altern ative approach to the search/organization problem of protein sequence databases. The neural networks used are three-layered, feed-forward, b ack-propagation networks. The protein sequences are encoded into neura l input vectors by a hashing method that counts occurrences of n-gram words. A new SVD (singular value decomposition) method, which compress es the long and sparse n-gram input vectors and captures semantics of n-gram words, has improved the generalization capability of the networ k. A full-scale protein classification system has been implemented on a Gray supercomputer to classify unknown sequences into 3311 PIR (Prot ein Identification Resource) superfamilies/families at a speed of less than 0.05 CPU second per sequence. The sensitivity is close to 90% ov erall, and approaches 100% for large superfamilies. The system could b e used to reduce the database search time and is being used to help or ganize the PIR protein sequence database.