DNA AND PEPTIDE SEQUENCES AND CHEMICAL PROCESSES MULTIVARIATELY MODELED BY PRINCIPAL COMPONENT ANALYSIS AND PARTIAL LEAST-SQUARES PROJECTIONS TO LATENT STRUCTURES

Citation
S. Wold et al., DNA AND PEPTIDE SEQUENCES AND CHEMICAL PROCESSES MULTIVARIATELY MODELED BY PRINCIPAL COMPONENT ANALYSIS AND PARTIAL LEAST-SQUARES PROJECTIONS TO LATENT STRUCTURES, Analytica chimica acta, 277(2), 1993, pp. 239-253
Citations number
31
Categorie Soggetti
Chemistry Analytical
Journal title
ISSN journal
00032670
Volume
277
Issue
2
Year of publication
1993
Pages
239 - 253
Database
ISI
SICI code
0003-2670(1993)277:2<239:DAPSAC>2.0.ZU;2-4
Abstract
Biopolymer sequences (e.g., DNA, RNA, proteins and polysaccharides) an d chemical processes (e.g., a batch or continuous polymer synthesis ru n in a chemical plant) have close similarities from the modelling poin t of view. When a set of sequences or processes is characterized by mu ltivariate data, a three-way data matrix is obtained. With sequences t he position and with processes the time is one direction in this matri x. The multivariate modelling of this matrix by principal component an alysis (PCA) or partial least-squares (PLS) methods for the following purposes is discussed: classification of sequences; quantitative relat ionships between sequence and biological activity or chemical properti es; optimizing a sequence with respect to selected properties; process diagnostics; and quantitative relationships between process variables and product quality variables. To obtain good models, a number of pro blems have to be adequately dealt with: appropriate characterization o f the sequence or process; experimental design (selecting sequences or process settings); transforming the three-way into a two-way matrix; and appropriate modelling and validation (modelling interactions, peri odicities, ''time series'' structures and ''neighbour effects''). A mu ltivariate approach to sequence and process modelling using PCA and PL S projections to latent structures is discussed and illustrated with s everal sets of peptide and DNA promoter data.