A fundamental task in data analysis is understanding the differences betwee
n several contrasting groups. These groups can represent different classes
of objects, such as male or female students, or the same group over time, e
.g. freshman students in 1993 through 1998. We present the problem of minin
g contrast sets: conjunctions of attributes and values that differ meaningf
ully in their distribution across groups. We provide a search algorithm for
mining contrast sets with pruning rules that drastically reduce the comput
ational complexity. Once the contrast sets are found, we post-process the r
esults to present a subset that are surprising to the user given what we ha
ve already shown. We explicitly control the probability of Type I error (fa
lse positives) and guarantee a maximum error rate for the entire analysis b
y using Bonferroni corrections.