A neural network classification method has been developed as an altern
ative approach to the search/organization problem of large molecular d
atabases. Two artificial neural systems have been implemented on a Gra
y supercomputer for rapid protein/nucleic acid sequence classification
s. The neural networks used are three-layered, feed-forward networks t
hat employ back-propagation learning algorithm. The molecular sequence
s are encoded into neural input vectors by applying an n-gram hashing
method or a SVD (singular value decomposition) method. Once trained wi
th known sequences in the molecular databases, the neural system becom
es an associative memory capable of classifying unknown sequences base
d on the class information embedded in its neural interconnections. Th
e protein system, which classifies proteins into PIR (Protein Identifi
cation Resource) superfamilies, showed a 82% to a close to 100% sensit
ivity at a speed that is about an order of magnitude faster than other
search methods. The pilot nucleic acid system, which classifies ribos
omal RNA sequences according to phylogenetic groups, has achieved a 10
0% classification accuracy. The system could be used to reduce the dat
abase search time and help organize the molecular sequence databases.
The tool is generally applicable to any databases that are organized a
ccording to family relationships.