Sr. Syunyaev et al., STATISTICAL TESTS OF COMPATIBILITY BETWEEN THE PRIMARY AND TERTIARY PROTEIN STRUCTURES, Molecular biology, 30(5), 1996, pp. 666-671
Aalysis of a tertiary (and primary) protein structure bank allowed tes
ts for 3D-1D compatibility to be worked out. Protein tertiary structur
e can be described by a profile, or set of environmental variables of
the residues. This method does not rely on the primary structure simil
arity, and the amino acid sequence of the tested protein is not used e
xplicitly Some authors postulate a Boltzmann distribution for a residu
e environmental variable, and formally introduce the notion of ''poten
tial energy'' of the sequence assigned to a given structure. They beli
eve that there is compatibility between the tertiary and primary struc
tures in a real 3D molecule with minimal potential, and this test can
be used to correlate the tertiary and primary structures. At the same
time, virtually no tools are available to assess reliability of the re
sults obtained; statistical tests of this kind are presented here. The
Neyman-Pearson likelihood ratio test is naturally used with division
of an index value into nonoverlapping intervals. The expression under
the most general assumptions (logarithm of the ratio between the proba
bility that the residue of a certain type has the value of property x
within a certain graduation interval and the probability that a random
-type residue has this value) was earlier believed to be a derivative
of the Boltzmann statistics. The a posteriori Bayes test accounts for
the probability that the highest value of the test provided by ideal s
tructure is quite high, which is particularly important for a growing
data bank. We selected two indices to assess the tests: the distance b
etween the geometric centers of side chains and of the whole globule,
and solvent accessibility of the residue. Calculation of the proposed
statistical tests indicate that the sequences recognize their own stru
ctures using either index (with a single exception of a short sequence
in both cases).