A scalable parallel algorithm for self-organizing maps with applications to sparse data mining problems

Citation
Rd. Lawrence et al., A scalable parallel algorithm for self-organizing maps with applications to sparse data mining problems, DATA M K D, 3(2), 1999, pp. 171-195
Citations number
24
Categorie Soggetti
AI Robotics and Automatic Control
Journal title
DATA MINING AND KNOWLEDGE DISCOVERY
ISSN journal
13845810 → ACNP
Volume
3
Issue
2
Year of publication
1999
Pages
171 - 195
Database
ISI
SICI code
1384-5810(199906)3:2<171:ASPAFS>2.0.ZU;2-6
Abstract
We describe a scalable parallel implementation of the self organizing map ( SOM) suitable for data-mining applications involving clustering or segmenta tion against large data sets such as those encountered in the analysis of c ustomer spending patterns. The parallel algorithm is based on the batch SOM formulation in which the neural weights are updated at the end of each pas s over the training data. The underlying serial algorithm is enhanced to ta ke advantage of the sparseness often encountered in these data sets. Analys is of a realistic test problem shows that the batch SOM algorithm captures key features observed using the conventional on-line algorithm, with compar able convergence rates. Performance measurements on an SP2 parallel computer are given for two reta il data sets and a publicly available set of census data.These results demo nstrate essentially linear speedup for the parallel batch SOM algorithm, us ing both a memory-contained sparse formulation as well as a separate implem entation in which the mining data is accessed directly from a parallel file system. We also present visualizations of the census data to illustrate th e value of the clustering information obtained via the parallel SOM method.