Am. Lesk, EXTRACTION OF GEOMETRICALLY SIMILAR SUBSTRUCTURES - LEAST-SQUARES ANDCHEBYSHEV FITTING AND THE DIFFERENCE DISTANCE MATRIX, Proteins, 33(3), 1998, pp. 320-328
In analysis, comparison and classification of conformations of protein
s, a common computational task involves extractions of similar substru
ctures. Structural comparisons are usually based on either of two meas
ures of similarity: the root-mean-square (r.m.s.) deviation upon optim
al superposition, or the maximal element of the difference distance ma
trix. The analysis presented here clarifies the relationships between
different measures of structural similarity, and can provide a basis f
or developing algorithms and software to extract all maximal common we
ll-fitting substructures from proteins. Given atomic coordinates of tw
o proteins, many methods have been described for extracting some subst
antial (if not provably maximal) common substructure with low r.m.s. d
eviation. This is a relatively easy task compared with the problem add
ressed here, i,e., that of finding all common substructures with r.m.s
. deviation less than a prespecified threshold. The combinatorial prob
lems associated with similar subset extraction are more tractable if e
xpressed in terms of the maximal element of the difference distance ma
trix than in terms of the r.m.s. deviation. However, it has been diffi
cult to correlate these alternative measures of structural similarity.
The purpose of this article is to make this connection. We first intr
oduce a third measure of structural similarity: the maximum distance b
etween corresponding pairs of points after superposition to minimize t
his value. This corresponds to fitting in the Chebyshev norm. Properti
es of Chebyshev superposition are derived. We describe relationships b
etween the r.m.s. and minimax (Chebyshev) deviations upon optimal supe
rposition, and between the Chebyshev deviation and the maximal element
of the difference distance matrix. Combining these produces a relatio
nship between the r.m.s. deviation upon optimal superposition and the
maximal element of the difference distance matrix. Based on these resu
lts, we can apply algorithms and software for finding subsets of the d
ifference distance matrix for which all elements are less than a speci
fied bound, either to select only subsets for which the r.m.s. deviati
on is less than or equal to a specified threshold, or to select subset
s that include all subsets for which the r.m.s. deviation is less than
or equal to a threshold, Proteins 33:320-328, 1998. (C) 1998 Wiley-Li
ss,Inc.