Jtl. Wang et al., DISCOVERING ACTIVE MOTIFS IN SETS OF RELATED PROTEIN SEQUENCES AND USING THEM FOR CLASSIFICATION, Nucleic acids research, 22(14), 1994, pp. 2769-2775
We describe a method for discovering active motifs in a set of related
protein sequences. The method is an automatic two step process: (1) f
ind candidate motifs in a small sample of the sequences; (2) test whet
her these motifs are approximately present in all the sequences. To re
duce the running time, we develop two optimization heuristics based on
statistical estimation and pattern matching techniques. Experimental
results obtained by running these algorithms on generated data and fun
ctionally related proteins demonstrate the good performance of the pre
sented method compared with visual method of O'Farrell and Leopold. By
combining the discovered motifs with an existing fingerprint techniqu
e, we develop a protein classifier. When we apply the classifier to th
e 698 groups of related proteins in the PROSITE catalog, it gives info
rmation that is complementary to the BLOCKS protein classifier of Heni
koff and Henikoff. Thus, using our classifier in conjunction with thei
rs, one can obtain high confidence classifications (if BLOCKS and our
classifier agree) or suggest a new hypothesis (if the two disagree).