K-means-type algorithms on distributed memory computer

Authors
Citation
Mk. Ng, K-means-type algorithms on distributed memory computer, INT J HI SP, 11(2), 2000, pp. 75-91
Citations number
26
Categorie Soggetti
Computer Science & Engineering
Journal title
INTERNATIONAL JOURNAL OF HIGH SPEED COMPUTING
ISSN journal
01290533 → ACNP
Volume
11
Issue
2
Year of publication
2000
Pages
75 - 91
Database
ISI
SICI code
0129-0533(200006)11:2<75:KAODMC>2.0.ZU;2-N
Abstract
Partitioning a set of objects into homogeneous clusters is a fundamental op eration in data mining. The k-means-type algorithm is best suited for imple menting this operation because of its efficiency in clustering large numeri cal and categorical data sets. An efficient parallel k-means-type algorithm for clustering data sets on a distributed share-nothing parallel system is considered. It has a simple communication scheme which performs only one r ound of information exchange in every iteration. We show that the speedup o f our algorithm is asymptotically linear when the number of objects is suff iciently large. We implement the parallel k-means-type algorithm on an IBM SP2 parallel machine. The performance studies show that the algorithm has n ice parallelism in experiments.