Differential Privacy (2)
Outline Review the basic definition Exponential mechanism Application in data mining Non-interactive DP
Definition Mechanism: K(x) = f(x) + D, D is some noise. It is an output perturbation method.
Sensitivity function How to design the noise D? It is actually linked back to the function f(x) Captures how great a difference must be hidden by the additive noise
Adding LAP noise
Exponential mechanism Think about that the Laplacian mechanism… a random sampling mechanism of the Laplacian distribution with the mean of the function output
Exponential mechanism What if we have a multidimensional function that outputs the optimal value at certain point, and we want to guarantee the differential privacy?
Exponential mechanism
Interactive data mining
Consider the simple decision tree algorithm - Search each possible value in each attribute to find the optimal partitioning point, partition the dataset to two sets - Recursively partition each subset, until a certain condition is met
Apply DP to build ID3 decision trees quality functions for partitioning Example: Information Gain: the amount of reduced entropy by splitting the dataset IG(D, “x<a”) = entropy(D) – n/N entropy(“x<a” of D) - (N-n)/N entropy(“x>=a” of D)
Sensitivity of IG
Also consider different quality functions They may give different model quality Gini index
More quality functions Max operator Gini ratio – sensitivity is unbounded
Privacy budget e User specified total budget e Composite operations need a specific e’ for each operation Sum of e’ should be less than e
Sketch of the algorithm
Experimental evaluation Using synthetic and reald datasets
J48 is a C4.5 implementation = ID3 + pruning
Tradeoff between utility and privacy
Non-interactive DP Noisy histogram release
Non interactive differential privacy Noisy histogram release Problems: sparse data dramatically increased data size The level of granularity trade-off between privacy and quality
Sampling and filtering Problem: privacy leak
Partitioning histogram drill-down, under constraint of privacy budget