A 3D-1D SUBSTITUTION MATRIX FOR PROTEIN FOLD RECOGNITION THAT INCLUDES PREDICTED SECONDARY STRUCTURE OF THE SEQUENCE

Citation
Dw. Rice et D. Eisenberg, A 3D-1D SUBSTITUTION MATRIX FOR PROTEIN FOLD RECOGNITION THAT INCLUDES PREDICTED SECONDARY STRUCTURE OF THE SEQUENCE, Journal of Molecular Biology, 267(4), 1997, pp. 1026-1038
Citations number
49
Categorie Soggetti
Biology
ISSN journal
00222836
Volume
267
Issue
4
Year of publication
1997
Pages
1026 - 1038
Database
ISI
SICI code
0022-2836(1997)267:4<1026:A3SMFP>2.0.ZU;2-#
Abstract
In protein fold recognition, a probe amino acid sequence is compared t o a Library of representative folds of known structure to identify a s tructural homolog. In cases where the probe and its homolog have clear sequence similarity, traditional residue substitution matrices have b een used to predict the structural similarity. In cases where the prob e is sequentially distant from its homolog, we have developed a (7 x 3 x 2 x 7 x 3) 3D-1D substitution matrix (called H3P2), calculated from a database of 119 structural pairs. Members of each pair share a simi lar fold, but have sequence identity less than 30%. Each probe sequenc e position is defined by one of seven residue classes and three second ary structure classes. Each homologous fold position is defined by one of seven residue classes, three secondary structure classes, and two burial classes. Thus the matrix is five-dimensional and contains 7 x 3 x 2 x 7 x 3 = 882 elements or 3D-1D scores. The first step in assigni ng a probe sequence to its homologous fold is the prediction of the th ree-state (helix, strand, coil) secondary structure of the probe; here we use the profile based neural network prediction of secondary struc ture (PHD) program. Then a dynamic programming algorithm uses the H3P2 matrix to align the probe sequence with structures in a representativ e fold library. To test the effectiveness of the H3P2 matrix a challen ging, fold class diverse, and cross-validated benchmark assessment is used to compare the H3P2 matrix to the GONNET, PAM250, BLOSUM62 and a secondary structure only substitution matrix. For distantly related se quences the H3P2 matrix detects more homologous structures at higher r eliabilities than do these other substitution matrices, based on sensi tivity versus specificity plots (or SENS-SPEC plots). The added effica cy of the H3P2 matrix arises from its information on the statistical p references for various sequence-structure environment combinations fro m very distantly related proteins. It introduces the predicted seconda ry structure information from a sequence into fold recognition in a st atistical way that normalizes the inherent correlations between residu e type, secondary structure and solvent accessibility. (C) 1997 Academ ic Press Limited.