AN ARTIFICIAL-INTELLIGENCE APPROACH TO MOTIF DISCOVERY IN PROTEIN SEQUENCES - APPLICATION TO STEROID DEHYDROGENASES

Citation
Tl. Bailey et al., AN ARTIFICIAL-INTELLIGENCE APPROACH TO MOTIF DISCOVERY IN PROTEIN SEQUENCES - APPLICATION TO STEROID DEHYDROGENASES, Journal of steroid biochemistry and molecular biology, 62(1), 1997, pp. 29-44
Citations number
44
Categorie Soggetti
Biology,"Endocrynology & Metabolism
ISSN journal
09600760
Volume
62
Issue
1
Year of publication
1997
Pages
29 - 44
Database
ISI
SICI code
0960-0760(1997)62:1<29:AAATMD>2.0.ZU;2-V
Abstract
MEME (Multiple Expectation-maximization for Motif Elicitation) is a un ique new software tool that uses artificial intelligence techniques to discover motifs shared by a set of protein sequences in a fully autom ated manner. This paper is the first detailed study of the use of MEME to analyse a large, biologically relevant set of sequences, and to ev aluate the sensitivity and accuracy of MEME in identifying structurall y important motifs. For this purpose, we chose the short-chain alcohol dehydrogenase superfamily because it is large and phylogenetically di verse, providing a test of how well MEME can work on sequences with lo w amino acid similarity. Moreover, this dataset contains enzymes of bi ological importance, and because several enzymes have known X-ray crys tallographic structures, we can test the usefulness of MEME for struct ural analysis. The first six motifs from MEME map onto structurally im portant alpha-helices and beta-strands on Streptomyces hydrogenans 20 beta-hydroxysteroid dehydrogenase. We also describe MAST (Motif Alignm ent Search Tool), which conveniently uses output from MEME for searchi ng databases such as SWISS-PROT and Genpept. MAST provides statistical measures that permit a rigorous evaluation of the significance of dat abase searches with individual motifs or groups of motifs. A database search of Genpept90 by MAST with the log-odds matrix of the first six motifs obtained from MEME yields a bimodal output, demonstrating the s electivity of MAST. We show for the first time, using primary sequence analysis, that bacterial sugar epimerases are homologs of short-chain dehydrogenases. MEME and MAST will be increasingly useful as genome s equencing provides large datasets of phylogenetically divergent sequen ces of biomedical interest. (C) 1997 Elsevier Science Ltd.