Jwm. Nissink et al., Simple knowledge-based descriptors to predict protein-ligand interactions.Methodology and validation, J COMPUT A, 14(8), 2000, pp. 787-803
A new type of shape descriptor is proposed to describe the spatial orientat
ion for non-covalent interactions. It is built from simple, anisotropic Gau
ssian contributions that are parameterised by 10 adjustable values. The des
criptors have been used to fit propensity distributions derived from scatte
r data stored in the IsoStar database. This database holds composite pictur
es of possible interaction geometries between a common central group and va
rious interacting moieties, as extracted from small-molecule crystal struct
ures. These distributions can be related to probabilities for the occurrenc
e of certain interaction geometries among different functional groups. A fi
tting procedure is described that generates the descriptors in a fully auto
mated way. For this purpose, we apply a similarity index that is tailored t
o the problem, the Split Hodgkin Index. It accounts for the similarity in r
egions of either high or low propensity in a separate way. Although depende
nt on the division into these two subregions, the index is robust and perfo
rms better than the regular Hodgkin index. The reliability and coverage of
the fitted descriptors was assessed using SuperStar. SuperStar usually oper
ates on the raw IsoStar data to calculate propensity distributions, e.g., f
or a binding site in a protein. For our purpose we modified the code to hav
e it operate on our descriptors instead. This resulted in a substantial red
uction in calculation time (factor of five to eight) compared to the origin
al implementation. A validation procedure was performed on a set of 130 pro
tein-ligand complexes, using four representative interacting probes to map
the properties of the various binding sites: ammonium nitrogen, alcohol oxy
gen, carbonyl oxygen, and methyl carbon. The predicted `hot spots' for the
binding of these probes were compared to the actual arrangement of ligand a
toms in experimentally determined protein-ligand complexes. Results indicat
e that the version of SuperStar that applies to our descriptors is capable
to predict the above-mentioned atom types in ligands correctly with success
rates of 59% and 74%, respectively, for all ligand atoms (regardless of th
eir solvent accessibility), and a subset of solvent-inaccessible ones. If n
ot only exact atom-type matches are counted, but also those that identify l
igand atoms of similar physicochemical properties, the prediction rates ris
e to 75% and 89%. These rates are close to those obtained by the original S
uperStar method (being 67% and 82%, respectively, for the prediction of exa
ct matching atom types, and 81% and 91% in the case of predicting similar a
tom types).