ITA
ENG

Detecting group differences: Mining contrast sets

Authors

Bay, SD Pazzani, MJ

Citation

Sd. Bay et Mj. Pazzani, Detecting group differences: Mining contrast sets, DATA M K D, 5(3), 2001, pp. 213-246

Citations number

Categorie Soggetti

AI Robotics and Automatic Control

Journal title

DATA MINING AND KNOWLEDGE DISCOVERY

ISSN journal

13845810 → ACNP

Volume

Issue

Year of publication

2001

Pages

213 - 246

Database

ISI

SICI code

1384-5810(2001)5:3<213:DGDMCS>2.0.ZU;2-3

Abstract

A fundamental task in data analysis is understanding the differences betwee n several contrasting groups. These groups can represent different classes of objects, such as male or female students, or the same group over time, e .g. freshman students in 1993 through 1998. We present the problem of minin g contrast sets: conjunctions of attributes and values that differ meaningf ully in their distribution across groups. We provide a search algorithm for mining contrast sets with pruning rules that drastically reduce the comput ational complexity. Once the contrast sets are found, we post-process the r esults to present a subset that are surprising to the user given what we ha ve already shown. We explicitly control the probability of Type I error (fa lse positives) and guarantee a maximum error rate for the entire analysis b y using Bonferroni corrections.