Ch. Wu et al., MOTIF IDENTIFICATION NEURAL DESIGN FOR RAPID AND SENSITIVE PROTEIN FAMILY SEARCH, Computer applications in the biosciences, 12(2), 1996, pp. 109-118
A new method, the motif identification neural design (MOTIFIND), has b
een developed for vapid and sensitive protein family identification. T
he method is an extension of our previous gene classification artifici
al neural system and employs new designs to enhance the detection of d
istant relationships. The new designs include an n-gram term weighting
algorithm for extracting local motif patterns, an enhanced n-gram met
hod for extracting residues of long-range correlation, and integrated
neural networks for combining global and motif sequence information. T
he system has been tested and compared with several existing methods u
sing three protein families, the cytochrome c, cytochrome b and flavod
oxin. Overall it achieves 100% sensitivity and > 99.6% specificity, an
accuracy comparable to BLAST, but at a speed of similar to 20 times f
aster. The system is much move robust than the PROSITE search which is
based on simple signature patterns. MOTIFIND also compares favorably
with BLIMPS, the Hidden Markov Model and PROFILESEARCH in detecting fr
agmentary sequences lacking complete motif regions and in detecting di
stant relationships, especially for members of under-represented subgr
oups within a family. MOTIFIND may be generally applicable to other pr
oteins and has the potential to become a full-scale database search an
d sequence analysis tool.