Distributed multivariate regression using wavelet-based collective data mining

Citation
De. Hershberger et H. Kargupta, Distributed multivariate regression using wavelet-based collective data mining, J PAR DISTR, 61(3), 2001, pp. 372-400
Citations number
44
Categorie Soggetti
Computer Science & Engineering
Journal title
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING
ISSN journal
07437315 → ACNP
Volume
61
Issue
3
Year of publication
2001
Pages
372 - 400
Database
ISI
SICI code
0743-7315(200103)61:3<372:DMRUWC>2.0.ZU;2-6
Abstract
This paper presents a method for distributed multivariate regression using wavelet-based collective data mining (CDM). The method seamlessly blends ma chine learning and the theory of communication with the statistical methods employed in parametric multivariate regression to provide an effective dat a mining technique for use in a distributed data and computation environmen t. The technique is applied to two benchmark data sets, producing results t hat are consistent with those obtained by applying standard parametric regr ession techniques to centralized data sets. Evaluation of the method in ter ms of mode accuracy as a function of appropriateness of the selected wavele t function, relative number of nonlinear cross-terms. and sample size demon strates that accurate parametric multivariate regression models call be gen erated from distributed, heterogeneous, data sets with minimal data communi cation overhead compared to that required to aggregate a distributed data s et. Application of this method to linear discriminant analysis, which is re lated Co parametric multivariate regression, produced classification result s on the Iris data set that are comparable to those obtained with centraliz ed data analysis. (C) 2001 Academic Press.