Data mining is defined as the process of seeking interesting or valuable in
formation within large data sets. This presents novel challenges and proble
ms, distinct from those typically arising in the allied areas of statistics
, machine learning, pattern recognition or database science. A distinction
is drawn between the two data mining activities of model building and patte
rn detection. Even though statisticians are familiar with the former, the l
arge data sets involved in data mining mean that novel problems do arise. T
he second of the activities, pattern detection, presents entirely new class
es of challenges, some arising, again, as a consequence of the large sizes
of the data sets. Data quality is a particularly troublesome issue in data
mining applications, and this is examined. The discussion is illustrated wi
th a variety of real examples.