Cluster analysis aims at identifying groups of similar objects and, therefo
re helps to discover distribution of patterns and interesting correlations
in large data sets. It has been subject of wide research since it arises in
many application domains in engineering, business and social sciences. Esp
ecially, in the last years the availability of huge transactional and exper
imental data sets and the arising requirements for data mining created need
s for clustering algorithms that scale and can be applied in diverse domain
s.
This paper introduces the fundamental concepts of clustering while it surve
ys the widely known clustering algorithms in a comparative way. Moreover, i
t addresses an important issue of clustering process regarding the quality
assessment of the clustering results. This is also related to the inherent
features of the data set under concern. A review of clustering validity mea
sures and approaches available in the literature is presented. Furthermore,
the paper illustrates the issues that are under-addressed by the recent al
gorithms and gives the trends in clustering process.