Detecting group differences: Mining contrast sets

Citation
Sd. Bay et Mj. Pazzani, Detecting group differences: Mining contrast sets, DATA M K D, 5(3), 2001, pp. 213-246
Citations number
51
Categorie Soggetti
AI Robotics and Automatic Control
Journal title
DATA MINING AND KNOWLEDGE DISCOVERY
ISSN journal
13845810 → ACNP
Volume
5
Issue
3
Year of publication
2001
Pages
213 - 246
Database
ISI
SICI code
1384-5810(2001)5:3<213:DGDMCS>2.0.ZU;2-3
Abstract
A fundamental task in data analysis is understanding the differences betwee n several contrasting groups. These groups can represent different classes of objects, such as male or female students, or the same group over time, e .g. freshman students in 1993 through 1998. We present the problem of minin g contrast sets: conjunctions of attributes and values that differ meaningf ully in their distribution across groups. We provide a search algorithm for mining contrast sets with pruning rules that drastically reduce the comput ational complexity. Once the contrast sets are found, we post-process the r esults to present a subset that are surprising to the user given what we ha ve already shown. We explicitly control the probability of Type I error (fa lse positives) and guarantee a maximum error rate for the entire analysis b y using Bonferroni corrections.