IDENTIFICATION AND CLASSIFICATION OF PROTEIN FOLD FAMILIES

Citation
Ca. Orengo et al., IDENTIFICATION AND CLASSIFICATION OF PROTEIN FOLD FAMILIES, Protein engineering, 6(5), 1993, pp. 485-500
Citations number
30
Categorie Soggetti
Biology
Journal title
ISSN journal
02692139
Volume
6
Issue
5
Year of publication
1993
Pages
485 - 500
Database
ISI
SICI code
0269-2139(1993)6:5<485:IACOPF>2.0.ZU;2-B
Abstract
We have developed a method for identifying fold families in the protei n structure data bank. Pairwise sequence alignments are first performe d to extract families of homologous proteins having 35% or more sequen ce identity. Representatives are selected with the best resolution and R-factor to give a nonhomologous data set. Subsequent structure compa risons between all members of this set detect homologous folds with lo w sequence identity but highly conserved structures. By softening the requirement on structural similarity, families of analogous proteins a re obtained that have related folds but more diverse structures. Repre sentatives are selected to give a non-analogous data set. Starting wit h 141 0 chains from the Brookhaven Data Bank, we generate a set of 150 nonhomologous folds and a set of 112 non-analogous folds. Analysis of sequence and structure conservation within the larger families shows the globins to be the most highly conserved family and the TIM barrels the most weakly conserved.