As protein-protein interaction is intrinsic to most cellular processes, the
ability to predict which proteins in the cell interact can aid significant
ly in identifying the function of newly discovered proteins, and in underst
anding the molecular networks they participate in. Here we demonstrate that
characteristic pairs of sequence-signatures can be learned from a database
of experimentally determined interacting proteins, where one protein conta
ins the one sequence-signature and its interacting partner contains the oth
er sequence-signature. The sequence-signatures that recur in concert in var
ious pairs of interacting proteins are termed correlated sequence-signature
s, and it is proposed that they can be used for predicting putative pairs o
f interacting partners in the cell. We demonstrate the potential of this ap
proach on a comprehensive database of experimentally determined pairs of in
teracting proteins in the yeast Saccharomyces cerevisiae. The proteins in t
his database have been characterized by their sequence-signatures, as defin
ed by the InterPro classification. A statistical analysis performed on all
possible combinations of sequence-signature pairs has identified those pair
s that are over-represented in the database of yeast interacting proteins.
It is demonstrated how the use of the correlated sequence-signatures as ide
ntifiers of interacting proteins can reduce significantly the search space,
and enable directed experimental interaction screens. (C) 2001 Academic Pr
ess.