A novel automated approach for the sequence specific NMR assignments of H-1
(N), C-13(alpha), C-13(beta), C-13'/H-1(alpha) and N-15 spins in proteins,
using triple resonance experimental data, is presented. The algorithm, TATA
PRO (Tracked AuTomated Assignments in Proteins) utilizes the protein primar
y sequence and peak lists from a set of triple resonance spectra which corr
elate H-1(N) and N-15 chemical shifts with those of C-13(alpha), C-13(beta)
and C-13'/H-1(alpha). The information derived from such correlations is us
ed to create a 'master_list' consisting of all possible sets of H-1(i)N, N-
15(i), C-13(i)alpha, C-13(i)beta, C-13'(i)/H-1(i)alpha, C-13(i-1)alpha, C-1
3(i-1)beta and C-12(i-1)'/H-1(i-1)alpha chemical shifts. On the basis of an
extensive statistical analysis of C-13(alpha) and C-13(beta) chemical shif
t data of proteins derived from the BioMagResBank (BMRB), it is shown that
the 20 amino acid residues can be grouped into eight distinct categories, e
ach of which is assigned a unique two-digit code. Such a code is used to ta
g individual sets of chemical shifts in the master_list and also to transla
te the protein primary sequence into an array called pps_array. The program
then uses the master_list to search for neighbouring partners of a given a
mino acid residue along the polypeptide chain and sequentially assigns a ma
ximum possible stretch of residues on either side. While doing so, each ass
igned residue is tracked in an array called assig_array, with the two-digit
code assigned earlier. The assig_array is then mapped onto the pps_array f
or sequence specific resonance assignment. The program has been tested usin
g experimental data on a calcium binding protein from Entamoeba histolytica
(Eh-CaBP, 15 kDa) having substantial internal sequence homology and using
published data on four other proteins in the molecular weight range of 18-4
2 kDa. In all the cases, nearly complete sequence specific resonance assign
ments (> 95%) are obtained. Furthermore, the reliability of the program has
been tested by deleting sets of chemical shifts randomly from the master_l
ist created for the test proteins.