ITA
ENG

Privacy-preserving data mining

Authors

Agrawal, R Srikant, R

Citation

R. Agrawal et R. Srikant, Privacy-preserving data mining, SIG RECORD, 29(2), 2000, pp. 439-450

Citations number

Categorie Soggetti

Computer Science & Engineering

Journal title

SIGMOD RECORD

ISSN journal

01635808 → ACNP

Volume

Issue

Year of publication

2000

Pages

439 - 450

Database

ISI

SICI code

0163-5808(200006)29:2<439:PDM>2.0.ZU;2-2

Abstract

A fruitful direction for future data mining research will be the developmen t of techniques that incorporate privacy concerns. Specifically, we address the following question. Since the primary task in data mining is the devel opment of models about aggregated data, can we develop accurate models with out access to precise information in individual data records? We consider t he concrete case of building a decision-tree classifier from training data in which the values of individual records have been perturbed. The resultin g data records look very different from the original records and the distri bution of data values is also very different from the original distribution . While it is not possible to accurately estimate original values in indivi dual data records, we propose a novel reconstruction procedure to accuratel y estimate the distribution of original data values. By using these reconst ructed distributions, we are able to build classifiers whose accuracy is co mparable to the accuracy of classifiers built with the original data.