Fast detection of common geometric substructure in proteins

Citation
Lp. Chew et al., Fast detection of common geometric substructure in proteins, J COMPUT BI, 6(3-4), 1999, pp. 313-325
Citations number
25
Categorie Soggetti
Biochemistry & Biophysics
Journal title
JOURNAL OF COMPUTATIONAL BIOLOGY
ISSN journal
10665277 → ACNP
Volume
6
Issue
3-4
Year of publication
1999
Pages
313 - 325
Database
ISI
SICI code
1066-5277(199923)6:3-4<313:FDOCGS>2.0.ZU;2-7
Abstract
We consider the problem of identifying common three-dimensional substructur es between proteins. Our method is based on comparing the shape of the cc-c arbon backbone structures of the proteins in order to find three-dimensiona l (3D) rigid motions that bring portions of the geometric structures into c orrespondence. We propose a geometric representation of protein backbone ch ains that is compact yet allows for similarity measures that are robust aga inst noise and outliers, This representation encodes the structure of the b ackbone as a sequence of unit vectors, defined by each adjacent pair of a-c arbons. We then define a measure of the similarity of two protein structure s based on the root mean squared (RMS) distance between corresponding orien tation vectors of the two proteins. Our measure has several advantages over measures that are commonly used for comparing protein shapes, such as the minimum RMS distance between the 3D positions of corresponding atoms in two proteins. A key advantage is that this new measure behaves well for identi fying common substructures, in contrast with position-based measures where the nonmatching portions of the structure dominate the measure. At the same time, it avoids the quadratic space and computational difficulties associa ted with methods based on distance matrices and contact maps. We show appli cations of our approach to detecting common contiguous substructures in pai rs of proteins, as well as the more difficult problem of identifying common protein domains (i.e., larger substructures that are not necessarily conti guous along the protein chain).