Representing and reasoning about protein families using generative and discriminative methods

Citation
Is. Mian et I. Dubchak, Representing and reasoning about protein families using generative and discriminative methods, J COMPUT BI, 7(6), 2000, pp. 849-862
Citations number
37
Categorie Soggetti
Biochemistry & Biophysics
Journal title
JOURNAL OF COMPUTATIONAL BIOLOGY
ISSN journal
10665277 → ACNP
Volume
7
Issue
6
Year of publication
2000
Pages
849 - 862
Database
ISI
SICI code
1066-5277(2000)7:6<849:RARAPF>2.0.ZU;2-S
Abstract
This work addresses the issues of data representation and incorporation of domain knowledge into the design of learning systems for reasoning about pr otein families. Given the limited expressive capacity of a particular metho d, a mixture of protein annotation and fold recognition experts, each imple menting a different underlying representation, should provide a robust meth od for assigning sequences to families. These ideas are illustrated using t wo data-driven learning methods that make use of different prior informatio n and employ independent, yet complementary, projections of a family: hidde n Markov models (HMMs) based on a multiple sequence alignment and neural ne tworks (NNs) based on global sequence descriptors of proteins. Examination of seven protein families indicates that combining a generative (HMM) and a discriminative (NN) method is better than either method on its own. Biolog ically, human 4-hydroxyphenylpyruvic acid dioxygenase, involved in tyrosine mia type 3, is predicted to be structurally and functionally related to the glyoxalase I family.