Data mining for fun and profit

Citation
Dj. Hand et al., Data mining for fun and profit, STAT SCI, 15(2), 2000, pp. 111-126
Citations number
27
Categorie Soggetti
Mathematics
Journal title
STATISTICAL SCIENCE
ISSN journal
08834237 → ACNP
Volume
15
Issue
2
Year of publication
2000
Pages
111 - 126
Database
ISI
SICI code
0883-4237(200005)15:2<111:DMFFAP>2.0.ZU;2-Y
Abstract
Data mining is defined as the process of seeking interesting or valuable in formation within large data sets. This presents novel challenges and proble ms, distinct from those typically arising in the allied areas of statistics , machine learning, pattern recognition or database science. A distinction is drawn between the two data mining activities of model building and patte rn detection. Even though statisticians are familiar with the former, the l arge data sets involved in data mining mean that novel problems do arise. T he second of the activities, pattern detection, presents entirely new class es of challenges, some arising, again, as a consequence of the large sizes of the data sets. Data quality is a particularly troublesome issue in data mining applications, and this is examined. The discussion is illustrated wi th a variety of real examples.