Automated discovery of structural signatures of protein fold and function

Citation
M. Turcotte et al., Automated discovery of structural signatures of protein fold and function, J MOL BIOL, 306(3), 2001, pp. 591-605
Citations number
56
Categorie Soggetti
Molecular Biology & Genetics
Journal title
JOURNAL OF MOLECULAR BIOLOGY
ISSN journal
00222836 → ACNP
Volume
306
Issue
3
Year of publication
2001
Pages
591 - 605
Database
ISI
SICI code
0022-2836(20010223)306:3<591:ADOSSO>2.0.ZU;2-P
Abstract
There are constraints on a protein sequence/structure for it to adopt a par ticular fold. These constraints could be either a local signature involving particular sequences or arrangements of secondary structure or a global si gnature involving features along the entire chain. To search systematically for protein fold signatures, we have explored the use of Inductive Logic P rogramming (ILP). ILP is a machine learning technique which derives rules f rom observation and encoded principles. The derived rules are readily inter preted in terms of concepts used by experts. For 20 populated folds in SCOP , 59 rules were found automatically. The accuracy of these rules, which is defined as the number of true positive plus true negative over the total nu mber of examples, is 74% (cross-validated value). Further analysis was carr ied out for 23 signatures covering 30 % or more positive examples of a part icular fold. The work showed that signatures of protein folds exist, about half of rules discovered automatically coincide with the level of fold in t he SCOP classification. Other signatures correspond to homologous family an d may be the consequence of a functional requirement. Examination of the ru les shows that many correspond to established principles published in speci fic literature. However, in general, the list of signatures is not part of standard biological databases of protein patterns. We find that the length of the loops makes an important contribution to the signatures, suggesting that this is an important determinant of the identity of protein folds. Wit h the expansion in the number of determined protein structures, stimulated by structural genomics initiatives, there will be an increased need for aut omated methods to extract principles of protein folding from coordinates. ( C) 2001 Academic Press.