Privacy-preserving data mining

Citation
R. Agrawal et R. Srikant, Privacy-preserving data mining, SIG RECORD, 29(2), 2000, pp. 439-450
Citations number
54
Categorie Soggetti
Computer Science & Engineering
Journal title
SIGMOD RECORD
ISSN journal
01635808 → ACNP
Volume
29
Issue
2
Year of publication
2000
Pages
439 - 450
Database
ISI
SICI code
0163-5808(200006)29:2<439:PDM>2.0.ZU;2-2
Abstract
A fruitful direction for future data mining research will be the developmen t of techniques that incorporate privacy concerns. Specifically, we address the following question. Since the primary task in data mining is the devel opment of models about aggregated data, can we develop accurate models with out access to precise information in individual data records? We consider t he concrete case of building a decision-tree classifier from training data in which the values of individual records have been perturbed. The resultin g data records look very different from the original records and the distri bution of data values is also very different from the original distribution . While it is not possible to accurately estimate original values in indivi dual data records, we propose a novel reconstruction procedure to accuratel y estimate the distribution of original data values. By using these reconst ructed distributions, we are able to build classifiers whose accuracy is co mparable to the accuracy of classifiers built with the original data.