ITA
ENG

Representing and reasoning about protein families using generative and discriminative methods

Authors

Mian, IS Dubchak, I

Citation

Is. Mian et I. Dubchak, Representing and reasoning about protein families using generative and discriminative methods, J COMPUT BI, 7(6), 2000, pp. 849-862

Citations number

Categorie Soggetti

Biochemistry & Biophysics

Journal title

JOURNAL OF COMPUTATIONAL BIOLOGY

ISSN journal

10665277 → ACNP

Volume

Issue

Year of publication

2000

Pages

849 - 862

Database

ISI

SICI code

1066-5277(2000)7:6<849:RARAPF>2.0.ZU;2-S

Abstract

This work addresses the issues of data representation and incorporation of domain knowledge into the design of learning systems for reasoning about pr otein families. Given the limited expressive capacity of a particular metho d, a mixture of protein annotation and fold recognition experts, each imple menting a different underlying representation, should provide a robust meth od for assigning sequences to families. These ideas are illustrated using t wo data-driven learning methods that make use of different prior informatio n and employ independent, yet complementary, projections of a family: hidde n Markov models (HMMs) based on a multiple sequence alignment and neural ne tworks (NNs) based on global sequence descriptors of proteins. Examination of seven protein families indicates that combining a generative (HMM) and a discriminative (NN) method is better than either method on its own. Biolog ically, human 4-hydroxyphenylpyruvic acid dioxygenase, involved in tyrosine mia type 3, is predicted to be structurally and functionally related to the glyoxalase I family.