M. Sjostrom et al., POLYPEPTIDE SEQUENCE PROPERTY RELATIONSHIPS IN ESCHERICHIA-COLI BASEDON AUTO CROSS COVARIANCES, Chemometrics and intelligent laboratory systems, 29(2), 1995, pp. 295-305
For multivariate classification and quantitative structure activity st
udies of proteins, which involve amino acid sequences of different len
gth, preprocessing methods are needed which make it possible to transl
ate the sequence into a quantitative measure with the same number of v
ariables. Here three different preprocessing methods are investigated.
Two of the methods are variants of auto cross covariances calculated
from a multipositional description of the protein sequence. For the mu
ltipositional description three orthogonal scales were used which phys
ico-chemically describes the amino acids. The third method is a quanti
fication of each sequence by a diamino acid frequency histogram. The m
ethods are investigated by a classification of 106 Escherichia coli an
d Gramnegative bacteria proteins. The proteins were divided into four
classes depending on their location in the cell. The four classes were
: cytoplasm, inner membrane, periplasm and outer membrane. For the pro
ceeding classification PLS discriminant analysis was used. The results
showed that one of the variants of auto cross covariances and the dia
mino acid frequency histogram representation contained much informatio
n related to the given classification problem. Hence the amino acid se
quences for proteins with different final locations in Escherichia col
i have significant features related to protein structure and location.