The biological role, biochemical function, and structure of uncharacterized
protein sequences is often inferred from their similarity to known protein
s. A constant goal is to increase the reliability, sensitivity, and accurac
y of alignment techniques to enable the detection of increasingly distant r
elationships. Development, tuning, and testing of these methods benefit fro
m appropriate benchmarks for the assessment of alignment accuracy.
Here, we describe a benchmark protocol to estimate sequence-to-sequence and
sequence-to-structure alignment accuracy. The protocol consists of structu
rally related pairs of proteins and procedures to evaluate alignment accura
cy over the whole set. The set of protein pairs covers all the currently kn
own fold types. The benchmark is challenging in the sense that it consists
of proteins lacking clear sequence similarity.
Correct target alignments are derived from the three-dimensional structures
of these pairs by rigid body superposition. An evaluation engine computes
the accuracy of alignments obtained from a particular algorithm in terms of
alignment shifts with respect to the structure derived alignments. Using t
his benchmark we estimate that the best results can be obtained from a comb
ination of amino acid residue substitution matrices and knowledge-based pot
entials. (C) 2000 Academic Press.