DNA AND PEPTIDE SEQUENCES AND CHEMICAL PROCESSES MULTIVARIATELY MODELED BY PRINCIPAL COMPONENT ANALYSIS AND PARTIAL LEAST-SQUARES PROJECTIONS TO LATENT STRUCTURES
S. Wold et al., DNA AND PEPTIDE SEQUENCES AND CHEMICAL PROCESSES MULTIVARIATELY MODELED BY PRINCIPAL COMPONENT ANALYSIS AND PARTIAL LEAST-SQUARES PROJECTIONS TO LATENT STRUCTURES, Analytica chimica acta, 277(2), 1993, pp. 239-253
Biopolymer sequences (e.g., DNA, RNA, proteins and polysaccharides) an
d chemical processes (e.g., a batch or continuous polymer synthesis ru
n in a chemical plant) have close similarities from the modelling poin
t of view. When a set of sequences or processes is characterized by mu
ltivariate data, a three-way data matrix is obtained. With sequences t
he position and with processes the time is one direction in this matri
x. The multivariate modelling of this matrix by principal component an
alysis (PCA) or partial least-squares (PLS) methods for the following
purposes is discussed: classification of sequences; quantitative relat
ionships between sequence and biological activity or chemical properti
es; optimizing a sequence with respect to selected properties; process
diagnostics; and quantitative relationships between process variables
and product quality variables. To obtain good models, a number of pro
blems have to be adequately dealt with: appropriate characterization o
f the sequence or process; experimental design (selecting sequences or
process settings); transforming the three-way into a two-way matrix;
and appropriate modelling and validation (modelling interactions, peri
odicities, ''time series'' structures and ''neighbour effects''). A mu
ltivariate approach to sequence and process modelling using PCA and PL
S projections to latent structures is discussed and illustrated with s
everal sets of peptide and DNA promoter data.