The inverse folding approach is a powerful tool in protein structure p
rediction when the native state of a sequence adopts one of the known
protein folds, This is because some proteins show strong sequence-stru
cture specificity in inverse folding experiments that allow gaps and i
nsertions in the sequence-structure alignment, In those cases when str
uctures similar to their native folds are included in the structure da
tabase, the z-scores (which measure the sequence-structure specificity
) of these folds are well separated from those of other alternative st
ructures, In this paper, we seek to understand the origin of this sequ
ence-structure specificity and to identify how the specificity arises
on passing from a short peptide chain to the entire protein sequence.
To accomplish this objective, a simplified version of inverse folding,
gapless inverse folding, is performed using sequence fragments of dif
ferent sizes from 53 proteins, The results indicate that usually a sig
nificant portion of the entire protein sequence is necessary to show s
equence-structure specificity, but there are regions in the sequence t
hat begin to show this specificity at relatively short fragment size (
15-20 residues), An island picture, in which the regions in the sequen
ce that recognize their own native structure grow from some seed fragm
ents, is observed as the fragment size increases, Usually, more simila
r structures to the native states are found in the top-scoring structu
ral fragments in these high-specificity regions.