EXTRACTION OF GEOMETRICALLY SIMILAR SUBSTRUCTURES - LEAST-SQUARES ANDCHEBYSHEV FITTING AND THE DIFFERENCE DISTANCE MATRIX

Authors
Citation
Am. Lesk, EXTRACTION OF GEOMETRICALLY SIMILAR SUBSTRUCTURES - LEAST-SQUARES ANDCHEBYSHEV FITTING AND THE DIFFERENCE DISTANCE MATRIX, Proteins, 33(3), 1998, pp. 320-328
Citations number
25
Categorie Soggetti
Biology,"Genetics & Heredity
Journal title
ISSN journal
08873585
Volume
33
Issue
3
Year of publication
1998
Pages
320 - 328
Database
ISI
SICI code
0887-3585(1998)33:3<320:EOGSS->2.0.ZU;2-6
Abstract
In analysis, comparison and classification of conformations of protein s, a common computational task involves extractions of similar substru ctures. Structural comparisons are usually based on either of two meas ures of similarity: the root-mean-square (r.m.s.) deviation upon optim al superposition, or the maximal element of the difference distance ma trix. The analysis presented here clarifies the relationships between different measures of structural similarity, and can provide a basis f or developing algorithms and software to extract all maximal common we ll-fitting substructures from proteins. Given atomic coordinates of tw o proteins, many methods have been described for extracting some subst antial (if not provably maximal) common substructure with low r.m.s. d eviation. This is a relatively easy task compared with the problem add ressed here, i,e., that of finding all common substructures with r.m.s . deviation less than a prespecified threshold. The combinatorial prob lems associated with similar subset extraction are more tractable if e xpressed in terms of the maximal element of the difference distance ma trix than in terms of the r.m.s. deviation. However, it has been diffi cult to correlate these alternative measures of structural similarity. The purpose of this article is to make this connection. We first intr oduce a third measure of structural similarity: the maximum distance b etween corresponding pairs of points after superposition to minimize t his value. This corresponds to fitting in the Chebyshev norm. Properti es of Chebyshev superposition are derived. We describe relationships b etween the r.m.s. and minimax (Chebyshev) deviations upon optimal supe rposition, and between the Chebyshev deviation and the maximal element of the difference distance matrix. Combining these produces a relatio nship between the r.m.s. deviation upon optimal superposition and the maximal element of the difference distance matrix. Based on these resu lts, we can apply algorithms and software for finding subsets of the d ifference distance matrix for which all elements are less than a speci fied bound, either to select only subsets for which the r.m.s. deviati on is less than or equal to a specified threshold, or to select subset s that include all subsets for which the r.m.s. deviation is less than or equal to a threshold, Proteins 33:320-328, 1998. (C) 1998 Wiley-Li ss,Inc.