MOTIF IDENTIFICATION NEURAL DESIGN FOR RAPID AND SENSITIVE PROTEIN FAMILY SEARCH

Citation
Ch. Wu et al., MOTIF IDENTIFICATION NEURAL DESIGN FOR RAPID AND SENSITIVE PROTEIN FAMILY SEARCH, Computer applications in the biosciences, 12(2), 1996, pp. 109-118
Citations number
26
Categorie Soggetti
Mathematical Methods, Biology & Medicine","Computer Sciences, Special Topics","Computer Science Interdisciplinary Applications","Biology Miscellaneous
ISSN journal
02667061
Volume
12
Issue
2
Year of publication
1996
Pages
109 - 118
Database
ISI
SICI code
0266-7061(1996)12:2<109:MINDFR>2.0.ZU;2-M
Abstract
A new method, the motif identification neural design (MOTIFIND), has b een developed for vapid and sensitive protein family identification. T he method is an extension of our previous gene classification artifici al neural system and employs new designs to enhance the detection of d istant relationships. The new designs include an n-gram term weighting algorithm for extracting local motif patterns, an enhanced n-gram met hod for extracting residues of long-range correlation, and integrated neural networks for combining global and motif sequence information. T he system has been tested and compared with several existing methods u sing three protein families, the cytochrome c, cytochrome b and flavod oxin. Overall it achieves 100% sensitivity and > 99.6% specificity, an accuracy comparable to BLAST, but at a speed of similar to 20 times f aster. The system is much move robust than the PROSITE search which is based on simple signature patterns. MOTIFIND also compares favorably with BLIMPS, the Hidden Markov Model and PROFILESEARCH in detecting fr agmentary sequences lacking complete motif regions and in detecting di stant relationships, especially for members of under-represented subgr oups within a family. MOTIFIND may be generally applicable to other pr oteins and has the potential to become a full-scale database search an d sequence analysis tool.