Practical limits of function prediction

Citation
D. Devos et A. Valencia, Practical limits of function prediction, PROTEINS, 41(1), 2000, pp. 98-107
Citations number
47
Categorie Soggetti
Biochemistry & Biophysics
Journal title
PROTEINS-STRUCTURE FUNCTION AND GENETICS
ISSN journal
08873585 → ACNP
Volume
41
Issue
1
Year of publication
2000
Pages
98 - 107
Database
ISI
SICI code
0887-3585(20001001)41:1<98:PLOFP>2.0.ZU;2-Q
Abstract
The widening gap between known protein sequences and their functions has le d to the practice of assigning a potential function to a protein on the bas is of sequence similarity to proteins whose function has been experimentall y investigated. We present here a critical view of the theoretical and prac tical bases for this approach. The results obtained by analyzing a signific ant number of true sequence similarities, derived directly from structural alignments, point to the complexity of function prediction. Different aspec ts of protein function, including (i) enzymatic function classification, (i i) functional annotations in the form of key words, (iii) classes of cellul ar function, and (iv) conservation of binding sites can only be reliably tr ansferred between similar sequences to a modest degree. The reason for this difficulty is a combination of the unavoidable database inaccuracies and t he plasticity of protein function. In addition, analysis of the relationshi p between sequence and functional descriptions defines an empirical limit f or pairwise-based functional annotations, namely, the three first digits of the six numbers used as descriptors of protein folds in the FSSP database can be predicted at an average level as low as 7.5% sequence identity two o f the four EC digits at 15% identity, half of the SWISS-PROT key words rela ted to protein function would require 20% identity, and the prediction of h alf of the residues in the binding site can be made at the 30% sequence ide ntity level. Proteins 2000;41:98-107. (C) 2000 Wiley-Liss, Inc.