ITA
ENG

VARIABLE SELECTION IN WAVELET REGRESSION-MODELS

Authors

ALSBERG BK WOODWARD AM WINSON MK ROWLAND JJ KELL DB

Citation

Bk. Alsberg et al., VARIABLE SELECTION IN WAVELET REGRESSION-MODELS, Analytica chimica acta, 368(1-2), 1998, pp. 29-44

Citations number

Categorie Soggetti

Chemistry Analytical

Journal title

Analytica chimica acta → ACNP

ISSN journal

00032670

Volume

368

Issue

1-2

Year of publication

1998

Pages

29 - 44

Database

ISI

SICI code

0003-2670(1998)368:1-2<29:VSIWR>2.0.ZU;2-C

Abstract

Variable selection and compression are often used to produce more pars imonious regression models. But when they are applied directly to the original spectrum domain, it is not easy to determine the type of feat ure the selected variables represent. By performing variable selection in the wavelet domain we show that it is possible to identify importa nt variables as being part of short- or large-scale features. Therefor e, the suggested method is to extract information about the selected v ariables that otherwise would have been inaccessible. We are also able to obtain information about the location of these features in the ori ginal domain. In this article we demonstrate three types of variable s election methods applied to the wavelet domain: selection of optimal c ombination of scales, thresholding based on mutual information and tru ncation of weight vectors in the partial least squares (PLS) regressio n algorithm. We found that truncation of weight vectors in PLS was the most effective method for selecting variables. For the two experiment al data sets tested we obtained approximately the same prediction erro r using less than 1% (for Data set 1) and 10% (for Data set 2) of the original variables. We also discovered that the selected variables wer e restricted to a limited number of wavelet scales. This information c an be used to suggest whether the underlying features may be dominated by narrow (selective) peaks (indicated by variables in short wavelet scale regions) or by broader regions (indicated by variables in long w avelet scale regions). Thus, wavelet regression is here used as an ext ension of the more traditional Fourier regression (where the modelling is performed in the frequency domain without taking into consideratio n any of the information in the time domain). (C) 1998 Elsevier Scienc e B.V. All rights reserved.