Use of a database of structural alignments and phylogenetic trees in investigating the relationship between sequence and structural variability amonghomologous proteins
S. Balaji et N. Srinivasan, Use of a database of structural alignments and phylogenetic trees in investigating the relationship between sequence and structural variability amonghomologous proteins, PROTEIN ENG, 14(4), 2001, pp. 219-226
The database PALI (Phylogeny and ALIgnment of homologous protein structures
) consists of families of protein domains of known three-dimensional (3D) s
tructure. In a PALI family, every member has been structurally aligned with
every other member (pairwise) and also simultaneous superposition (multipl
e) of all the members has been performed. The database also contains 3D str
ucture-based and structure-dependent sequence similarity-based phylogenetic
dendrograms for all the families. The PALI release used in the present ana
lysis comprises 225 families derived largely from the HOMSTRAD and SCOP dat
abases. The quality of the multiple rigid-body structural alignments in PAL
I was compared with that obtained from COMPARER, which encodes a procedure
based on properties and relationships. The alignments from the two procedur
es agreed very well and variations are seen only in the low sequence simila
rity cases often in the loop regions. A validation of Direct Pairwise Align
ment (DPA) between two proteins is provided by comparing it with Pairwise a
lignment extracted from Multiple Alignment of all the members in the family
(PMA). In general, DPA and PMA are found to vary rarely. The ready availab
ility of pairwise alignments allows the analysis of variations in structura
l distances as a function of sequence similarities and number of topologica
lly equivalent Ca atoms. The structural distance metric used in the analysi
s combines root mean square deviation (r.m.s.d.) and number of equivalences
, and is shown to vary similarly to r.m.s.d. The correlation between sequen
ce similarity and structural similarity is poor in pairs with low sequence
similarities. A comparison of sequence and 3D structure-based phylogenies f
or all the families suggests that only a few families have a radical differ
ence in the two kinds of dendrograms. The difference could occur when the s
equence similarity among the homologues is low or when the structures are s
ubjected to evolutionary pressure for the retention of function. The PALI d
atabase is expected to be useful in furthering our understanding of the rel
ationship between sequences and structures of homologous proteins and their
evolution.