Searching the protein structure databank with weak sequence patterns and structural constraints

Citation
I. Jonassen et al., Searching the protein structure databank with weak sequence patterns and structural constraints, J MOL BIOL, 304(4), 2000, pp. 599-619
Citations number
28
Categorie Soggetti
Molecular Biology & Genetics
Journal title
JOURNAL OF MOLECULAR BIOLOGY
ISSN journal
00222836 → ACNP
Volume
304
Issue
4
Year of publication
2000
Pages
599 - 619
Database
ISI
SICI code
0022-2836(200012)304:4<599:STPSDW>2.0.ZU;2-N
Abstract
A method is described in which proteins that match PROSITE patterns are fil tered by the root-mean-square deviation of the local 3D structures of the p robe and target over the pattern components. This was found to increase the discrimination between true and false members of the protein family but wa s dependent on how unique the structural features in the pattern were compa red to equivalent fragments extracted from the structure databank (for exam ple; if the pattern fell in an a-helix, then discrimination was poor.) We t hen generalised the sequence patterns (by widening the range of amino acid residues allowed at each position) and monitored how well the structural in formation helped retain specificity. While the discrimination of the pure s equence pattern had generally disappeared at information content values les s than ten bits, the discrimination of the combined sequence structure prob e remained high at this point before following a similar decay. The displac ement between these curves indicates that the structural component is, on a verage, equivalent to about ten bits. The sequence patterns were also filte red using the structure comparison program SAP, giving a global, rather tha n local "view" of the proteins. This allowed the information content of the sequence patterns to become even less specific but raised problems of whet her some proteins encountered with the same fold but no PROSITE pattern sho uld constitute family members. (C) 2000 Academic Press.