Presentation is loading. Please wait.

Presentation is loading. Please wait.

Differential Privacy (2)

Similar presentations


Presentation on theme: "Differential Privacy (2)"— Presentation transcript:

1 Differential Privacy (2)

2 Outline Review the basic definition Exponential mechanism
Application in data mining Non-interactive DP

3 Definition Mechanism: K(x) = f(x) + D, D is some noise.
It is an output perturbation method.

4 Sensitivity function How to design the noise D? It is actually linked back to the function f(x) Captures how great a difference must be hidden by the additive noise

5 Adding LAP noise

6 Exponential mechanism
Think about that the Laplacian mechanism… a random sampling mechanism of the Laplacian distribution with the mean of the function output

7 Exponential mechanism
What if we have a multidimensional function that outputs the optimal value at certain point, and we want to guarantee the differential privacy?

8 Exponential mechanism

9 Interactive data mining

10 Consider the simple decision tree algorithm
- Search each possible value in each attribute to find the optimal partitioning point, partition the dataset to two sets - Recursively partition each subset, until a certain condition is met

11 Apply DP to build ID3 decision trees
quality functions for partitioning Example: Information Gain: the amount of reduced entropy by splitting the dataset IG(D, “x<a”) = entropy(D) – n/N entropy(“x<a” of D) - (N-n)/N entropy(“x>=a” of D)

12 Sensitivity of IG

13 Also consider different quality functions
They may give different model quality Gini index

14 More quality functions
Max operator Gini ratio – sensitivity is unbounded

15 Privacy budget e User specified total budget e
Composite operations need a specific e’ for each operation Sum of e’ should be less than e

16 Sketch of the algorithm

17 Experimental evaluation
Using synthetic and reald datasets

18 J48 is a C4.5 implementation = ID3 + pruning

19 Tradeoff between utility and privacy

20 Non-interactive DP Noisy histogram release

21 Non interactive differential privacy
Noisy histogram release Problems: sparse data  dramatically increased data size The level of granularity  trade-off between privacy and quality

22 Sampling and filtering
Problem: privacy leak

23 Partitioning histogram drill-down, under constraint of privacy budget


Download ppt "Differential Privacy (2)"

Similar presentations


Ads by Google