I. Jonassen et al., Searching the protein structure databank with weak sequence patterns and structural constraints, J MOL BIOL, 304(4), 2000, pp. 599-619
A method is described in which proteins that match PROSITE patterns are fil
tered by the root-mean-square deviation of the local 3D structures of the p
robe and target over the pattern components. This was found to increase the
discrimination between true and false members of the protein family but wa
s dependent on how unique the structural features in the pattern were compa
red to equivalent fragments extracted from the structure databank (for exam
ple; if the pattern fell in an a-helix, then discrimination was poor.) We t
hen generalised the sequence patterns (by widening the range of amino acid
residues allowed at each position) and monitored how well the structural in
formation helped retain specificity. While the discrimination of the pure s
equence pattern had generally disappeared at information content values les
s than ten bits, the discrimination of the combined sequence structure prob
e remained high at this point before following a similar decay. The displac
ement between these curves indicates that the structural component is, on a
verage, equivalent to about ten bits. The sequence patterns were also filte
red using the structure comparison program SAP, giving a global, rather tha
n local "view" of the proteins. This allowed the information content of the
sequence patterns to become even less specific but raised problems of whet
her some proteins encountered with the same fold but no PROSITE pattern sho
uld constitute family members. (C) 2000 Academic Press.