From fold predictions to function predictions: Automation of functional site conservation analysis for functional genome predictions

Citation
Bh. Zhang et al., From fold predictions to function predictions: Automation of functional site conservation analysis for functional genome predictions, PROTEIN SCI, 8(5), 1999, pp. 1104-1115
Citations number
31
Categorie Soggetti
Biochemistry & Biophysics
Journal title
PROTEIN SCIENCE
ISSN journal
09618368 → ACNP
Volume
8
Issue
5
Year of publication
1999
Pages
1104 - 1115
Database
ISI
SICI code
0961-8368(199905)8:5<1104:FFPTFP>2.0.ZU;2-S
Abstract
A database of functional sites for proteins with known structures, SITE, is constructed and used in conjunction with a simple pattern matching program SiteMatch to evaluate possible function conservation in a recently constru cted database of fold predictions for Escherichia coli proteins (Rychlewski L et at., 1999, Protein Sci 8:614-624). In this and other prediction datab ases, fold predictions are based on algorithms that can recognize weak sequ ence similarities and putatively assign new proteins into already character ized protein families. It is not clear whether such sequence similarities a rise from distant homologies or general similarity of physicochemical featu res along the sequence. Leaving aside the important question of nature of r elations within fold superfamilies, it is possible to assess possible funct ion conservation by looking at the pattern of conservation of crucial funct ional residues. SITE consists of a multilevel function description based on structure annotations and structure analyses. In particular, active site r esidues, ligand binding residues, and patterns of hydrophobic residues on t he protein surface are used to describe different functional features. Site Match, a simple pattern matching program, is designed to check the conserva tion of residues involved in protein activity in alignments generated by an y alignment method. Here, this procedure is used to study conservation of f unctional features in alignments between protein sequences from the E. coli genome and their optimal structural templates. The optimal templates were identified and alignments taken from the database of genomic structural pre dictions was described in a previous publication (Rychlewski L et al., 1999 , Protein Sci 8:614-624). An automated assessment of function conservation is used to analyze the relation between fold and function similarity for a large number of fold predictions. For instance, it is shown that identifyin g low significance predictions with a high level of functional residue cons ervations can be used to extend the prediction sensitivity for fold predict ion methods. Over 100 new fold/function predictions in this class were obta ined in the E. coli genome. At the same time, about 30% of our previous fol d predictions are not confirmed as function predictions, further highlighti ng the problem of function divergence in fold superfamilies.