Sr. Sunyaev et al., ARE KNOWLEDGE-BASED POTENTIALS DERIVED FROM PROTEIN-STRUCTURE SETS DISCRIMINATIVE WITH RESPECT TO AMINO-ACID TYPES, Proteins, 31(3), 1998, pp. 225-246
The parametric description of residue environments through solvent acc
essibility, backbone conformation, or pairwise residue-residue distanc
es is the key to the comparison between amino acid types at protein se
quence positions and residue locations in structural templates (condit
ion of protein sequence-structure match). For the first time, the rese
arch results presented in this study clarify and allow to quantify, on
a rigorous statistical basis, to what extent the amino acid type-spec
ific distributions of commonly used environment parameters are discrim
inative with respect to the 20 amino acid types. Relying on the Bahadu
r theory, we estimate the probability of error in a single-sequence-st
ructure alignment based on weak or absent discriminative power in a le
arning database of protein structure. We present the results for many
residue environment variables and demonstrate that each fold descripti
on parameter is sensitive with respect to only a few amino acid types
while indifferent to most of the other amino acid types, Even complex
structural characteristics combining solvent-accessible surface area,
backbone conformation, and pairwise distances distinguish only some am
ino acid types, whereas the others remain nondiscriminated. We find th
at the knowledge-based potentials currently in use treat especially Al
a, Asp, Gin, His, Ser, Thr, and Tyr as essentially ''average'' amino a
cids. Thus, highly discriminative amino acid types define the alignmen
t register in gapless sequence-structure alignments. The introduction
of gaps leads to alignment ambiguities at sequence positions occupied
by nondiscriminated amino acid types. Therefore, local sequence-struct
ure alignments produced by techniques with gaps cannot be reliable. Co
nceptionally new and more sensitive environment parameters must be inv
ented. (C) 1998 Wiley-Liss, Inc.