ITA
ENG

DISCOVERING ACTIVE MOTIFS IN SETS OF RELATED PROTEIN SEQUENCES AND USING THEM FOR CLASSIFICATION

Authors

WANG JTL MARR TG SHASHA D SHAPIRO BA CHIRN GW

Citation

Jtl. Wang et al., DISCOVERING ACTIVE MOTIFS IN SETS OF RELATED PROTEIN SEQUENCES AND USING THEM FOR CLASSIFICATION, Nucleic acids research, 22(14), 1994, pp. 2769-2775

Citations number

Categorie Soggetti

Biology

Journal title

Nucleic acids research → ACNP

ISSN journal

03051048

Volume

Issue

Year of publication

1994

Pages

2769 - 2775

Database

ISI

SICI code

0305-1048(1994)22:14<2769:DAMISO>2.0.ZU;2-K

Abstract

We describe a method for discovering active motifs in a set of related protein sequences. The method is an automatic two step process: (1) f ind candidate motifs in a small sample of the sequences; (2) test whet her these motifs are approximately present in all the sequences. To re duce the running time, we develop two optimization heuristics based on statistical estimation and pattern matching techniques. Experimental results obtained by running these algorithms on generated data and fun ctionally related proteins demonstrate the good performance of the pre sented method compared with visual method of O'Farrell and Leopold. By combining the discovered motifs with an existing fingerprint techniqu e, we develop a protein classifier. When we apply the classifier to th e 698 groups of related proteins in the PROSITE catalog, it gives info rmation that is complementary to the BLOCKS protein classifier of Heni koff and Henikoff. Thus, using our classifier in conjunction with thei rs, one can obtain high confidence classifications (if BLOCKS and our classifier agree) or suggest a new hypothesis (if the two disagree).