This paper presents a method for distributed multivariate regression using
wavelet-based collective data mining (CDM). The method seamlessly blends ma
chine learning and the theory of communication with the statistical methods
employed in parametric multivariate regression to provide an effective dat
a mining technique for use in a distributed data and computation environmen
t. The technique is applied to two benchmark data sets, producing results t
hat are consistent with those obtained by applying standard parametric regr
ession techniques to centralized data sets. Evaluation of the method in ter
ms of mode accuracy as a function of appropriateness of the selected wavele
t function, relative number of nonlinear cross-terms. and sample size demon
strates that accurate parametric multivariate regression models call be gen
erated from distributed, heterogeneous, data sets with minimal data communi
cation overhead compared to that required to aggregate a distributed data s
et. Application of this method to linear discriminant analysis, which is re
lated Co parametric multivariate regression, produced classification result
s on the Iris data set that are comparable to those obtained with centraliz
ed data analysis. (C) 2001 Academic Press.