With the advent of genome sequencing projects, the amino acid sequence
s of thousands of proteins are determined every year, Each of these pr
otein sequences must be identified with its function and its 3-dimensi
onal structure for us to gain a full understanding of the molecular bi
ology of organisms, To meet this challenge, new methods are being deve
loped for fold recognition, the computational assignment of newly dete
rmined amino acid sequences to 3-dimensional protein structures, These
methods start with a library of known 3-dimensional target protein st
ructures, The new probe sequence is then aligned to each target protei
n structure in the library and the compatibility of the sequence for t
hat structure is scored, If a target structure is found to have a sign
ificantly high compatibility score, it is assumed that the probe seque
nce folds in much the same way as the target structure, The fundamenta
l assumptions of this approach are that many different sequences fold
in similar ways and there is a relatively high probability that a new
sequence possesses a previously observed fold. We review various appro
aches to fold recognition and break down the process into its main ste
ps: creation of a library of target folds; representation of the folds
; alignment of the probe sequence to a target fold using a sequence-to
-structure compatibility scoring function; and assessment of significa
nce of compatibility. We emphasize that even though this new field of
fold recognition has made rapid progress, technical problems remain to
be solved in most of the steps. Standard benchmarks may help identify
the problem steps and find solutions to the problems.