E. Furuichi et P. Koehl, INFLUENCE OF PROTEIN-STRUCTURE DATABASES ON THE PREDICTIVE POWER OF STATISTICAL PAIR POTENTIALS, Proteins, 31(2), 1998, pp. 139-149
A long standing goal in protein structure studies is the development o
f reliable energy functions that can be used both to verify protein mo
dels derived from experimental constraints as well as for theoretical
protein folding and inverse folding computer experiments. In that resp
ect, knowledge-based statistical pair potentials have attracted consid
erable interests recently mainly because they include the essential fe
atures of protein structures as well as solvent effects at a low compu
ting cost. However, the basis on which statistical potentials are deri
ved have been questioned, In this paper, we investigate statistical pa
ir potentials derived from protein three-dimensional structures, addre
ssing in particular questions related to the form of these potentials,
as well as to the content of the database from which they are derived
. We have shown that statistical pair potentials depend on the size of
the proteins included in the database, and that this dependence can b
e reduced by considering only pairs of residue close in space (i.e., w
ith a cutoff of 8 Angstrom). We have shown also that statistical poten
tials carry a memory of the quality of the database in terms of the am
ount and diversity of secondary structure it contains. We find, for ex
ample, that potentials derived from a database containing alpha-protei
ns will only perform best on alpha-proteins in fold recognition comput
er experiments. We believe that this is an overall weakness of these p
otentials, which must be kept in mind when constructing a database. (C
) 1998 Wiley-Liss,Inc.