Use of a database of structural alignments and phylogenetic trees in investigating the relationship between sequence and structural variability amonghomologous proteins

Citation
S. Balaji et N. Srinivasan, Use of a database of structural alignments and phylogenetic trees in investigating the relationship between sequence and structural variability amonghomologous proteins, PROTEIN ENG, 14(4), 2001, pp. 219-226
Citations number
45
Categorie Soggetti
Biochemistry & Biophysics
Journal title
PROTEIN ENGINEERING
ISSN journal
02692139 → ACNP
Volume
14
Issue
4
Year of publication
2001
Pages
219 - 226
Database
ISI
SICI code
0269-2139(200104)14:4<219:UOADOS>2.0.ZU;2-X
Abstract
The database PALI (Phylogeny and ALIgnment of homologous protein structures ) consists of families of protein domains of known three-dimensional (3D) s tructure. In a PALI family, every member has been structurally aligned with every other member (pairwise) and also simultaneous superposition (multipl e) of all the members has been performed. The database also contains 3D str ucture-based and structure-dependent sequence similarity-based phylogenetic dendrograms for all the families. The PALI release used in the present ana lysis comprises 225 families derived largely from the HOMSTRAD and SCOP dat abases. The quality of the multiple rigid-body structural alignments in PAL I was compared with that obtained from COMPARER, which encodes a procedure based on properties and relationships. The alignments from the two procedur es agreed very well and variations are seen only in the low sequence simila rity cases often in the loop regions. A validation of Direct Pairwise Align ment (DPA) between two proteins is provided by comparing it with Pairwise a lignment extracted from Multiple Alignment of all the members in the family (PMA). In general, DPA and PMA are found to vary rarely. The ready availab ility of pairwise alignments allows the analysis of variations in structura l distances as a function of sequence similarities and number of topologica lly equivalent Ca atoms. The structural distance metric used in the analysi s combines root mean square deviation (r.m.s.d.) and number of equivalences , and is shown to vary similarly to r.m.s.d. The correlation between sequen ce similarity and structural similarity is poor in pairs with low sequence similarities. A comparison of sequence and 3D structure-based phylogenies f or all the families suggests that only a few families have a radical differ ence in the two kinds of dendrograms. The difference could occur when the s equence similarity among the homologues is low or when the structures are s ubjected to evolutionary pressure for the retention of function. The PALI d atabase is expected to be useful in furthering our understanding of the rel ationship between sequences and structures of homologous proteins and their evolution.