Identification of related proteins on family, superfamily and fold level

Citation
E. Lindahl et A. Elofsson, Identification of related proteins on family, superfamily and fold level, J MOL BIOL, 295(3), 2000, pp. 613-625
Citations number
33
Categorie Soggetti
Molecular Biology & Genetics
Journal title
JOURNAL OF MOLECULAR BIOLOGY
ISSN journal
00222836 → ACNP
Volume
295
Issue
3
Year of publication
2000
Pages
613 - 625
Database
ISI
SICI code
0022-2836(20000121)295:3<613:IORPOF>2.0.ZU;2-L
Abstract
Proteins might have considerable structural similarities even when no evolu tionary relationship of their sequences can be detected. This property is o ften referred to as the proteins sharing only a "fold". Of course, there ar e also sequences of common origin in each fold, called a "super-family", an d in them groups of sequences with clear similarities, designated "'family" . Developing;algorithms to reliably identify proteins related at any level is one of Be most important challenges in the fast growing held of bioinfor matics today. However, if is not at all certain that a method proficient at finding sequence similarities performs well at the other levels, or vice v ersa. Here, we have compared the performance of various search methods on these d ifferent levels of similarity. As expected, we show that it becomes much ha rder to detect proteins as their sequences diverge. For family related sequ ences the best method gets 75 % of the top hits correct. When the sequences differ but the proteins belong to the same superfamily this drops to 29 %, and in the case of proteins with only fold similarity it is as low as 15 % . We have made a more complete analysis of the performance of different alg orithms than earlier studies, also including threading methods in the compa rison. Using this method a more detailed picture emerges, showing multiple sequence information to improve detection on the two closer levels of relat ionship. We have also compared the different methods of including this info rmation in prediction algorithms. For lower specificities, the best scheme to use is a linking method connect ing proteins through an intermediate hit. For higher specificities, better performance is obtained by PSI-BLAST and some procedures using hidden Marko v models. We also show that a threading method, THREADER, performs signific antly better than any other method at fold recognition. (C) 2000 Academic P ress.