Partitioning a set of objects into homogeneous clusters is a fundamental op
eration in data mining. The k-means-type algorithm is best suited for imple
menting this operation because of its efficiency in clustering large numeri
cal and categorical data sets. An efficient parallel k-means-type algorithm
for clustering data sets on a distributed share-nothing parallel system is
considered. It has a simple communication scheme which performs only one r
ound of information exchange in every iteration. We show that the speedup o
f our algorithm is asymptotically linear when the number of objects is suff
iciently large. We implement the parallel k-means-type algorithm on an IBM
SP2 parallel machine. The performance studies show that the algorithm has n
ice parallelism in experiments.