Ag. De Brevern et al., Bayesian probabilistic approach for predicting backbone structures in terms of protein blocks, PROTEINS, 41(3), 2000, pp. 271-287
By using an unsupervised cluster analyzer, we have identified a local struc
tural alphabet composed of 16 folding patterns of five consecutive C-alpha
("protein blocks"). The dependence that exists between successive blocks is
explicitly taken into account. A Bayesian approach based on the relation p
rotein block-amino acid propensity is used for prediction and leads to a su
ccess rate close to 35%, Sharing sequence windows associated with certain b
locks into "sequence families" improves the prediction accuracy by 6%, This
prediction accuracy exceeds 75% when keeping the first four predicted prot
ein blocks at each site of the protein. In addition, two different strategi
es are proposed: the first one defines the number of protein blocks in each
site needed for respecting a user-fixed prediction accuracy, and alternati
vely, the second one defines the different protein sites to be predicted wi
th a user-fixed number of blocks and a chosen accuracy, This last strategy
applied to the ubiquitin conjugating enzyme (alpha/beta protein) shows that
91% of the sites may be predicted with a prediction accuracy larger than 7
7% considering only three blocks per site. The prediction strategies propos
ed improve our knowledge about sequence-structure dependence and should be
very useful in ab initio protein modelling. (C) 2000 Wiley-Liss, Inc.