REGRESSION-ANALYSIS OF MULTIPLE PROTEIN STRUCTURES

Citation
Td. Wu et al., REGRESSION-ANALYSIS OF MULTIPLE PROTEIN STRUCTURES, Journal of computational biology, 5(3), 1998, pp. 585-595
Citations number
18
Categorie Soggetti
Mathematics,Biology,"Biochemical Research Methods",Mathematics,"Biothechnology & Applied Migrobiology
ISSN journal
10665277
Volume
5
Issue
3
Year of publication
1998
Pages
585 - 595
Database
ISI
SICI code
1066-5277(1998)5:3<585:ROMPS>2.0.ZU;2-U
Abstract
A general framework is presented for analyzing multiple protein struct ures using statistical regression methods. The regression approach can superimpose protein structures rigidly or with shear. Also, this appr oach can superimpose multiple structures explicitly, without resorting to pairwise superpositions. The algorithm alternates between matching corresponding landmarks among the protein structures and superimposin g these landmarks. Matching is performed using a robust dynamic progra mming technique that uses gap penalties that adapt to the given data. Superposition is performed using either orthogonal transformations, wh ich impose the rigid-body assumption, or affine transformations, which allow shear. The resulting regression model of a protein family measu res the amount. of structural variability at. each landmark. A variati on of our algorithm permits a separate weight for each landmark, there by allowing one to emphasize particular segments of a protein structur e or to compensate for variances that differ at various positions in a structure. In addition, a method is introduced for finding an initial correspondence, by measuring the discrete curvature along each protei n backbone. Discrete curvature also characterizes the secondary struct ure of a protein backbone, distinguishing among helical, strand, and l oop regions. An example is presented involving a set of seven globin s tructures. Regression analysis, using both affine and orthogonal trans formations, reveals that globins are most strongly conserved structura lly in helical regions, particularly in the mid-regions of the E, F, a nd G helices.