A general framework is presented for analyzing multiple protein struct
ures using statistical regression methods. The regression approach can
superimpose protein structures rigidly or with shear. Also, this appr
oach can superimpose multiple structures explicitly, without resorting
to pairwise superpositions. The algorithm alternates between matching
corresponding landmarks among the protein structures and superimposin
g these landmarks. Matching is performed using a robust dynamic progra
mming technique that uses gap penalties that adapt to the given data.
Superposition is performed using either orthogonal transformations, wh
ich impose the rigid-body assumption, or affine transformations, which
allow shear. The resulting regression model of a protein family measu
res the amount. of structural variability at. each landmark. A variati
on of our algorithm permits a separate weight for each landmark, there
by allowing one to emphasize particular segments of a protein structur
e or to compensate for variances that differ at various positions in a
structure. In addition, a method is introduced for finding an initial
correspondence, by measuring the discrete curvature along each protei
n backbone. Discrete curvature also characterizes the secondary struct
ure of a protein backbone, distinguishing among helical, strand, and l
oop regions. An example is presented involving a set of seven globin s
tructures. Regression analysis, using both affine and orthogonal trans
formations, reveals that globins are most strongly conserved structura
lly in helical regions, particularly in the mid-regions of the E, F, a
nd G helices.