STATISTICAL TESTS OF COMPATIBILITY BETWEEN THE PRIMARY AND TERTIARY PROTEIN STRUCTURES

Citation
Sr. Syunyaev et al., STATISTICAL TESTS OF COMPATIBILITY BETWEEN THE PRIMARY AND TERTIARY PROTEIN STRUCTURES, Molecular biology, 30(5), 1996, pp. 666-671
Citations number
13
Categorie Soggetti
Biology
Journal title
ISSN journal
00268933
Volume
30
Issue
5
Year of publication
1996
Part
2
Pages
666 - 671
Database
ISI
SICI code
0026-8933(1996)30:5<666:STOCBT>2.0.ZU;2-Z
Abstract
Aalysis of a tertiary (and primary) protein structure bank allowed tes ts for 3D-1D compatibility to be worked out. Protein tertiary structur e can be described by a profile, or set of environmental variables of the residues. This method does not rely on the primary structure simil arity, and the amino acid sequence of the tested protein is not used e xplicitly Some authors postulate a Boltzmann distribution for a residu e environmental variable, and formally introduce the notion of ''poten tial energy'' of the sequence assigned to a given structure. They beli eve that there is compatibility between the tertiary and primary struc tures in a real 3D molecule with minimal potential, and this test can be used to correlate the tertiary and primary structures. At the same time, virtually no tools are available to assess reliability of the re sults obtained; statistical tests of this kind are presented here. The Neyman-Pearson likelihood ratio test is naturally used with division of an index value into nonoverlapping intervals. The expression under the most general assumptions (logarithm of the ratio between the proba bility that the residue of a certain type has the value of property x within a certain graduation interval and the probability that a random -type residue has this value) was earlier believed to be a derivative of the Boltzmann statistics. The a posteriori Bayes test accounts for the probability that the highest value of the test provided by ideal s tructure is quite high, which is particularly important for a growing data bank. We selected two indices to assess the tests: the distance b etween the geometric centers of side chains and of the whole globule, and solvent accessibility of the residue. Calculation of the proposed statistical tests indicate that the sequences recognize their own stru ctures using either index (with a single exception of a short sequence in both cases).