We consider the problem of identifying common three-dimensional substructur
es between proteins. Our method is based on comparing the shape of the cc-c
arbon backbone structures of the proteins in order to find three-dimensiona
l (3D) rigid motions that bring portions of the geometric structures into c
orrespondence. We propose a geometric representation of protein backbone ch
ains that is compact yet allows for similarity measures that are robust aga
inst noise and outliers, This representation encodes the structure of the b
ackbone as a sequence of unit vectors, defined by each adjacent pair of a-c
arbons. We then define a measure of the similarity of two protein structure
s based on the root mean squared (RMS) distance between corresponding orien
tation vectors of the two proteins. Our measure has several advantages over
measures that are commonly used for comparing protein shapes, such as the
minimum RMS distance between the 3D positions of corresponding atoms in two
proteins. A key advantage is that this new measure behaves well for identi
fying common substructures, in contrast with position-based measures where
the nonmatching portions of the structure dominate the measure. At the same
time, it avoids the quadratic space and computational difficulties associa
ted with methods based on distance matrices and contact maps. We show appli
cations of our approach to detecting common contiguous substructures in pai
rs of proteins, as well as the more difficult problem of identifying common
protein domains (i.e., larger substructures that are not necessarily conti
guous along the protein chain).