Rd. Lawrence et al., A scalable parallel algorithm for self-organizing maps with applications to sparse data mining problems, DATA M K D, 3(2), 1999, pp. 171-195
We describe a scalable parallel implementation of the self organizing map (
SOM) suitable for data-mining applications involving clustering or segmenta
tion against large data sets such as those encountered in the analysis of c
ustomer spending patterns. The parallel algorithm is based on the batch SOM
formulation in which the neural weights are updated at the end of each pas
s over the training data. The underlying serial algorithm is enhanced to ta
ke advantage of the sparseness often encountered in these data sets. Analys
is of a realistic test problem shows that the batch SOM algorithm captures
key features observed using the conventional on-line algorithm, with compar
able convergence rates.
Performance measurements on an SP2 parallel computer are given for two reta
il data sets and a publicly available set of census data.These results demo
nstrate essentially linear speedup for the parallel batch SOM algorithm, us
ing both a memory-contained sparse formulation as well as a separate implem
entation in which the mining data is accessed directly from a parallel file
system. We also present visualizations of the census data to illustrate th
e value of the clustering information obtained via the parallel SOM method.