Differential Privacy (2)

Slides:



Advertisements
Similar presentations
I have a DREAM! (DiffeRentially privatE smArt Metering) Gergely Acs and Claude Castelluccia {gergely.acs, INRIA 2011.
Advertisements

Publishing Set-Valued Data via Differential Privacy Rui Chen, Concordia University Noman Mohammed, Concordia University Benjamin C. M. Fung, Concordia.
1 Machine Learning: Lecture 3 Decision Tree Learning (Based on Chapter 3 of Mitchell T.., Machine Learning, 1997)
Differentially Private Recommendation Systems Jeremiah Blocki Fall A: Foundations of Security and Privacy.
Differentially Private Transit Data Publication: A Case Study on the Montreal Transportation System ` Introduction With the deployment of smart card automated.
Classification Algorithms
Hunt’s Algorithm CIT365: Data Mining & Data Warehousing Bajuna Salehe
Decision Tree Approach in Data Mining
Decision Tree Algorithm (C4.5)
Decision Tree.
BOAT - Optimistic Decision Tree Construction Gehrke, J. Ganti V., Ramakrishnan R., Loh, W.
Privacy Enhancing Technologies
Spatial and Temporal Data Mining V. Megalooikonomou Introduction to Decision Trees ( based on notes by Jiawei Han and Micheline Kamber and on notes by.
(C) 2001 SNU CSE Biointelligence Lab Incremental Classification Using Tree- Based Sampling for Large Data H. Yoon, K. Alsabti, and S. Ranka Instance Selection.
Differential Privacy (2). Outline  Using differential privacy Database queries Data mining  Non interactive case  New developments.
Ordinal Decision Trees Qinghua Hu Harbin Institute of Technology
Privacy-Preserving Data Mining Rakesh Agrawal Ramakrishnan Srikant IBM Almaden Research Center 650 Harry Road, San Jose, CA Published in: ACM SIGMOD.
Preserving Privacy in Clickstreams Isabelle Stanton.
Jeff Howbert Introduction to Machine Learning Winter Machine Learning Feature Creation and Selection.
Differentially Private Data Release for Data Mining Benjamin C.M. Fung Concordia University Montreal, QC, Canada Noman Mohammed Concordia University Montreal,
Data Mining: Classification
Data mining and machine learning A brief introduction.
Mohammad Ali Keyvanrad
APPLYING EPSILON-DIFFERENTIAL PRIVATE QUERY LOG RELEASING SCHEME TO DOCUMENT RETRIEVAL Sicong Zhang, Hui Yang, Lisa Singh Georgetown University August.
CS573 Data Privacy and Security Statistical Databases
Chapter 9 – Classification and Regression Trees
Differentially Private Data Release for Data Mining Noman Mohammed*, Rui Chen*, Benjamin C. M. Fung*, Philip S. Yu + *Concordia University, Montreal, Canada.
Computational Intelligence: Methods and Applications Lecture 19 Pruning of decision trees Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
On Graph Query Optimization in Large Networks Alice Leung ICS 624 4/14/2011.
Comparing Univariate and Multivariate Decision Trees Olcay Taner Yıldız Ethem Alpaydın Department of Computer Engineering Bogazici University
Skewing: An Efficient Alternative to Lookahead for Decision Tree Induction David PageSoumya Ray Department of Biostatistics and Medical Informatics Department.
Decision Tree Learning Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 25, 2014.
Privacy preserving data mining Li Xiong CS573 Data Privacy and Anonymity.
Additive Data Perturbation: the Basic Problem and Techniques.
Randomization in Privacy Preserving Data Mining Agrawal, R., and Srikant, R. Privacy-Preserving Data Mining, ACM SIGMOD’00 the following slides include.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
Differential Privacy Some contents are borrowed from Adam Smith’s slides.
DECISION TREE Ge Song. Introduction ■ Decision Tree: is a supervised learning algorithm used for classification or regression. ■ Decision Tree Graph:
Lecture Notes for Chapter 4 Introduction to Data Mining
Cameron Rowe.  Introduction  Purpose  Implementation  Simple Example Problem  Extended Kalman Filters  Conclusion  Real World Examples.
Differential Privacy (1). Outline  Background  Definition.
Differential Privacy Xintao Wu Oct 31, Sanitization approaches Input perturbation –Add noise to data –Generalize data Summary statistics –Means,
Yang, et al. Differentially Private Data Publication and Analysis. Tutorial at SIGMOD’12 Part 4: Data Dependent Query Processing Methods Yin “David” Yang.
Outline Decision tree representation ID3 learning algorithm Entropy, Information gain Issues in decision tree learning 2.
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
Mining High-Speed Data Streams Presented by: William Kniffin Pedro Domingos Geoff Hulten Sixth ACM SIGKDD International Conference
Behavior Recognition Based on Machine Learning Algorithms for a Wireless Canine Machine Interface Students: Avichay Ben Naim Lucie Levy 14 May, 2014 Ort.
DECISION TREE INDUCTION CLASSIFICATION AND PREDICTION What is classification? what is prediction? Issues for classification and prediction. What is decision.
University of Texas at El Paso
Data Transformation: Normalization
DECISION TREES An internal node represents a test on an attribute.
Privacy-Preserving Data Mining
Decision Trees an introduction.
k-Nearest neighbors and decision tree
Computational Intelligence: Methods and Applications
Decision Trees (suggested time: 30 min)
Compressing Relations And Indexes
RE-Tree: An Efficient Index Structure for Regular Expressions
Privacy-preserving Release of Statistics: Differential Privacy
Introduction to Data Mining, 2nd Edition by
Machine Learning Feature Creation and Selection
Introduction to Data Mining, 2nd Edition by
ID3 Algorithm.
Differential Privacy in Practice
Bootstrapped Optimistic Algorithm for Tree Construction
Decision Trees By Cole Daily CSCI 446.
Statistical Learning Dong Liu Dept. EEIS, USTC.
Data Transformations targeted at minimizing experimental variance
Published in: IEEE Transactions on Industrial Informatics
Differential Privacy (1)
Presentation transcript:

Differential Privacy (2)

Outline Review the basic definition Exponential mechanism Application in data mining Non-interactive DP

Definition Mechanism: K(x) = f(x) + D, D is some noise. It is an output perturbation method.

Sensitivity function How to design the noise D? It is actually linked back to the function f(x) Captures how great a difference must be hidden by the additive noise

Adding LAP noise

Exponential mechanism Think about that the Laplacian mechanism… a random sampling mechanism of the Laplacian distribution with the mean of the function output

Exponential mechanism What if we have a multidimensional function that outputs the optimal value at certain point, and we want to guarantee the differential privacy?

Exponential mechanism

Interactive data mining

Consider the simple decision tree algorithm - Search each possible value in each attribute to find the optimal partitioning point, partition the dataset to two sets - Recursively partition each subset, until a certain condition is met

Apply DP to build ID3 decision trees quality functions for partitioning Example: Information Gain: the amount of reduced entropy by splitting the dataset IG(D, “x<a”) = entropy(D) – n/N entropy(“x<a” of D) - (N-n)/N entropy(“x>=a” of D)

Sensitivity of IG

Also consider different quality functions They may give different model quality Gini index

More quality functions Max operator Gini ratio – sensitivity is unbounded

Privacy budget e User specified total budget e Composite operations need a specific e’ for each operation Sum of e’ should be less than e

Sketch of the algorithm

Experimental evaluation Using synthetic and reald datasets

J48 is a C4.5 implementation = ID3 + pruning

Tradeoff between utility and privacy

Non-interactive DP Noisy histogram release

Non interactive differential privacy Noisy histogram release Problems: sparse data  dramatically increased data size The level of granularity  trade-off between privacy and quality

Sampling and filtering Problem: privacy leak

Partitioning histogram drill-down, under constraint of privacy budget