DISCOVERING ACTIVE MOTIFS IN SETS OF RELATED PROTEIN SEQUENCES AND USING THEM FOR CLASSIFICATION

Citation
Jtl. Wang et al., DISCOVERING ACTIVE MOTIFS IN SETS OF RELATED PROTEIN SEQUENCES AND USING THEM FOR CLASSIFICATION, Nucleic acids research, 22(14), 1994, pp. 2769-2775
Citations number
42
Categorie Soggetti
Biology
Journal title
ISSN journal
03051048
Volume
22
Issue
14
Year of publication
1994
Pages
2769 - 2775
Database
ISI
SICI code
0305-1048(1994)22:14<2769:DAMISO>2.0.ZU;2-K
Abstract
We describe a method for discovering active motifs in a set of related protein sequences. The method is an automatic two step process: (1) f ind candidate motifs in a small sample of the sequences; (2) test whet her these motifs are approximately present in all the sequences. To re duce the running time, we develop two optimization heuristics based on statistical estimation and pattern matching techniques. Experimental results obtained by running these algorithms on generated data and fun ctionally related proteins demonstrate the good performance of the pre sented method compared with visual method of O'Farrell and Leopold. By combining the discovered motifs with an existing fingerprint techniqu e, we develop a protein classifier. When we apply the classifier to th e 698 groups of related proteins in the PROSITE catalog, it gives info rmation that is complementary to the BLOCKS protein classifier of Heni koff and Henikoff. Thus, using our classifier in conjunction with thei rs, one can obtain high confidence classifications (if BLOCKS and our classifier agree) or suggest a new hypothesis (if the two disagree).