ITA
ENG

Identification of related proteins on family, superfamily and fold level

Authors

Lindahl, E Elofsson, A

Citation

E. Lindahl et A. Elofsson, Identification of related proteins on family, superfamily and fold level, J MOL BIOL, 295(3), 2000, pp. 613-625

Citations number

Categorie Soggetti

Molecular Biology & Genetics

Journal title

JOURNAL OF MOLECULAR BIOLOGY

ISSN journal

00222836 → ACNP

Volume

295

Issue

Year of publication

2000

Pages

613 - 625

Database

ISI

SICI code

0022-2836(20000121)295:3<613:IORPOF>2.0.ZU;2-L

Abstract

Proteins might have considerable structural similarities even when no evolu tionary relationship of their sequences can be detected. This property is o ften referred to as the proteins sharing only a "fold". Of course, there ar e also sequences of common origin in each fold, called a "super-family", an d in them groups of sequences with clear similarities, designated "'family" . Developing;algorithms to reliably identify proteins related at any level is one of Be most important challenges in the fast growing held of bioinfor matics today. However, if is not at all certain that a method proficient at finding sequence similarities performs well at the other levels, or vice v ersa. Here, we have compared the performance of various search methods on these d ifferent levels of similarity. As expected, we show that it becomes much ha rder to detect proteins as their sequences diverge. For family related sequ ences the best method gets 75 % of the top hits correct. When the sequences differ but the proteins belong to the same superfamily this drops to 29 %, and in the case of proteins with only fold similarity it is as low as 15 % . We have made a more complete analysis of the performance of different alg orithms than earlier studies, also including threading methods in the compa rison. Using this method a more detailed picture emerges, showing multiple sequence information to improve detection on the two closer levels of relat ionship. We have also compared the different methods of including this info rmation in prediction algorithms. For lower specificities, the best scheme to use is a linking method connect ing proteins through an intermediate hit. For higher specificities, better performance is obtained by PSI-BLAST and some procedures using hidden Marko v models. We also show that a threading method, THREADER, performs signific antly better than any other method at fold recognition. (C) 2000 Academic P ress.