The NOESY JIGSAW: Automated protein secondary structure and main-chain assignment from sparse, unassigned NMR data

Citation
C. Bailey-kellogg et al., The NOESY JIGSAW: Automated protein secondary structure and main-chain assignment from sparse, unassigned NMR data, J COMPUT BI, 7(3-4), 2000, pp. 537-558
Citations number
46
Categorie Soggetti
Biochemistry & Biophysics
Journal title
JOURNAL OF COMPUTATIONAL BIOLOGY
ISSN journal
10665277 → ACNP
Volume
7
Issue
3-4
Year of publication
2000
Pages
537 - 558
Database
ISI
SICI code
1066-5277(2000)7:3-4<537:TNJAPS>2.0.ZU;2-3
Abstract
High-throughput, data-directed computational protocols for Structural Genom ics (or Proteomics) are required in order to evaluate the protein products of genes for structure and function at rates comparable to current gene-seq uencing technology. This paper presents the JIGSAW algorithm, a novel high- throughput, automated approach to protein structure characterization with n uclear magnetic resonance (NMR), JIGSAW applies graph algorithms and probab ilistic reasoning techniques, enforcing first-principles consistency rules in order to overcome a 5-10% signal-to-noise ratio, It consists of two main components: (1) graph-based secondary structure pattern identification in unassigned heteronuclear NMR data, and (2) assignment of spectral peaks by probabilstic alignment of identified secondary structure elements against t he primary sequence. Deferring assignment eliminates the bottleneck faced b y traditional approaches, which begin by correlating peaks among dozens of experiments. JIGSAW utilizes only four experiments, none of which requires C-13-labeled protein, thus dramatically reducing both the amount and expens e of wet lab molecular biology and the total spectrometer time, Results for three test proteins demonstrate that JIGSAW correctly identifies 79-100% o f alpha -helical and 46-65% of beta -sheet NOE connectivities and correctly aligns 33-100% of secondary structure elements. JIGSAW is very fast, runni ng in minutes on a Pentium-class Linux workstation, This approach yields qu ick and reasonably accurate (as opposed to the traditional slow and extreme ly accurate) structure calculations. It could be useful for quick structura l assays to speed data to the biologist early in an investigation and could in principle be applied in an automation-like fashion to a large fraction of the proteome.