Proteins might have considerable structural similarities even when no evolu
tionary relationship of their sequences can be detected. This property is o
ften referred to as the proteins sharing only a "fold". Of course, there ar
e also sequences of common origin in each fold, called a "super-family", an
d in them groups of sequences with clear similarities, designated "'family"
. Developing;algorithms to reliably identify proteins related at any level
is one of Be most important challenges in the fast growing held of bioinfor
matics today. However, if is not at all certain that a method proficient at
finding sequence similarities performs well at the other levels, or vice v
ersa.
Here, we have compared the performance of various search methods on these d
ifferent levels of similarity. As expected, we show that it becomes much ha
rder to detect proteins as their sequences diverge. For family related sequ
ences the best method gets 75 % of the top hits correct. When the sequences
differ but the proteins belong to the same superfamily this drops to 29 %,
and in the case of proteins with only fold similarity it is as low as 15 %
. We have made a more complete analysis of the performance of different alg
orithms than earlier studies, also including threading methods in the compa
rison. Using this method a more detailed picture emerges, showing multiple
sequence information to improve detection on the two closer levels of relat
ionship. We have also compared the different methods of including this info
rmation in prediction algorithms.
For lower specificities, the best scheme to use is a linking method connect
ing proteins through an intermediate hit. For higher specificities, better
performance is obtained by PSI-BLAST and some procedures using hidden Marko
v models. We also show that a threading method, THREADER, performs signific
antly better than any other method at fold recognition. (C) 2000 Academic P
ress.