The widening gap between known protein sequences and their functions has le
d to the practice of assigning a potential function to a protein on the bas
is of sequence similarity to proteins whose function has been experimentall
y investigated. We present here a critical view of the theoretical and prac
tical bases for this approach. The results obtained by analyzing a signific
ant number of true sequence similarities, derived directly from structural
alignments, point to the complexity of function prediction. Different aspec
ts of protein function, including (i) enzymatic function classification, (i
i) functional annotations in the form of key words, (iii) classes of cellul
ar function, and (iv) conservation of binding sites can only be reliably tr
ansferred between similar sequences to a modest degree. The reason for this
difficulty is a combination of the unavoidable database inaccuracies and t
he plasticity of protein function. In addition, analysis of the relationshi
p between sequence and functional descriptions defines an empirical limit f
or pairwise-based functional annotations, namely, the three first digits of
the six numbers used as descriptors of protein folds in the FSSP database
can be predicted at an average level as low as 7.5% sequence identity two o
f the four EC digits at 15% identity, half of the SWISS-PROT key words rela
ted to protein function would require 20% identity, and the prediction of h
alf of the residues in the binding site can be made at the 30% sequence ide
ntity level. Proteins 2000;41:98-107. (C) 2000 Wiley-Liss, Inc.