Mr. Betancourt et J. Skolnick, Finding the needle in a haystack: Educing native folds from ambiguous ab initio protein structure predictions, J COMPUT CH, 22(3), 2001, pp. 339-353
Current ab initio structure-prediction methods are sometimes able to genera
te families of folds, one of which is native, but are unable to single out
the native one due to imperfections in the folding potentials and an inabil
ity to conduct thorough explorations of the conformational space. To addres
s this issue, here we describe a method for the detection of statistically
significant folds from a pool of predicted structures. Our approach consist
s of clustering and averaging the structures into representative fold famil
ies. Using a metric derived from the root-mean-square distance (RMSD) that
is less sensitive to protein size, we determine whether the simulated struc
tures are clustered in relation to a group of random structures. The cluste
ring method searches for cluster centers and iteratively calculates the clu
sters and their respective centroids. The centroid interresidue distances a
re adjusted by minimizing a potential constructed from the corresponding av
erage distances of the cluster structures. Application of this method to se
lected proteins shows that it can detect the best fold family that is close
st to native, along with several other misfolded families. We also describe
a method to obtain substructures. This is useful when the folding simulati
on fails to give a total topology prediction but produces common subelement
s among the structures. We have created a web server that clusters user sub
mitted structures, which can be found at http://bioinformatics.danforthcent
er.org/services/scar. (C) 2001 John Wiley & Sons, Inc.