A mathematical formalism is introduced that has general applicability
to many protein structure models used in the various approaches to the
''inverse protein folding problem.'' The inverse nature of the proble
m arises from the fact that one begins with a set of assumed tertiary
structures and searches for those most compatible with a new sequence,
rather than attempting to predict the structure directly from the new
sequence. The formalism is based on the well-known theory of Markov r
andom fields (MRFs). Our MRF formulation provides explicit representat
ions for the relevant amino acid position environments and the physica
l topologies of the structural contacts. In particular, MRF models can
readily be constructed for the secondary structure packing topologies
found in protein domain cores, or other structural motifs, that are a
nticipated to be common among large sets of both homologous and nonhom
ologous proteins. MRF models are probabilistic and can exploit the sta
tistical data from the limited number of proteins having known domain
structures. The MRF approach leads to a new scoring function for compa
ring different threadings (placements) of a sequence through different
structure models. The scoring function is very important, because com
paring alternative structure models with each other is a key step in t
he inverse folding problem. Unlike previously published scoring functi
ons, the one derived in this paper is based on a comprehensive probabi
listic formulation of the threading problem.