C. Bailey-kellogg et al., The NOESY JIGSAW: Automated protein secondary structure and main-chain assignment from sparse, unassigned NMR data, J COMPUT BI, 7(3-4), 2000, pp. 537-558
High-throughput, data-directed computational protocols for Structural Genom
ics (or Proteomics) are required in order to evaluate the protein products
of genes for structure and function at rates comparable to current gene-seq
uencing technology. This paper presents the JIGSAW algorithm, a novel high-
throughput, automated approach to protein structure characterization with n
uclear magnetic resonance (NMR), JIGSAW applies graph algorithms and probab
ilistic reasoning techniques, enforcing first-principles consistency rules
in order to overcome a 5-10% signal-to-noise ratio, It consists of two main
components: (1) graph-based secondary structure pattern identification in
unassigned heteronuclear NMR data, and (2) assignment of spectral peaks by
probabilstic alignment of identified secondary structure elements against t
he primary sequence. Deferring assignment eliminates the bottleneck faced b
y traditional approaches, which begin by correlating peaks among dozens of
experiments. JIGSAW utilizes only four experiments, none of which requires
C-13-labeled protein, thus dramatically reducing both the amount and expens
e of wet lab molecular biology and the total spectrometer time, Results for
three test proteins demonstrate that JIGSAW correctly identifies 79-100% o
f alpha -helical and 46-65% of beta -sheet NOE connectivities and correctly
aligns 33-100% of secondary structure elements. JIGSAW is very fast, runni
ng in minutes on a Pentium-class Linux workstation, This approach yields qu
ick and reasonably accurate (as opposed to the traditional slow and extreme
ly accurate) structure calculations. It could be useful for quick structura
l assays to speed data to the biologist early in an investigation and could
in principle be applied in an automation-like fashion to a large fraction
of the proteome.