The general principle of parsimonious data modeling states that if two
models in some way adequately model a given set of data, the one that
is described by a fewer number of parameters will have better predict
ive ability given new data. This concept is of interest in multivariat
e calibration since several new non-linear modeling techniques have be
come available. Three such methods are neural networks, projection pur
suit regression (PPR) and multivariate adaptive regression splines (MA
RS). These methods, while capable of modeling non-linearities, typical
ly have very many parameters that need to be estimated during the mode
l building phase. The biased calibration methods, principal components
regression (PCR) and partial least squares (PLS) are linear methods a
nd so may not as efficiently describe some types of non-linearities, h
owever have comparably very few parameters to be estimated. It is ther
efore of interest to study the parsimony principle formally in order t
o understand under what circumstances the various methods are appropri
ate. In this paper, the mathematical theory of parsimonious data model
ing is presented. The assumptions made in the theory are shown to hold
for multivariate calibration methods. This theory is used to provide
a procedure for selecting the most parsimonious model structure for a
given calibration application.