Presentation is loading. Please wait.

Presentation is loading. Please wait.

Private Data Management with Verification

Similar presentations


Presentation on theme: "Private Data Management with Verification"— Presentation transcript:

1 Private Data Management with Verification
Yan Chen Duke University Advisor: Ashwin Machanavajjhala

2 Outlines Motivation Private Verification – differentially private regression diagnostics Future work (ongoing): private verification on counting queries for data dependent algorithms Future work (idea): private data synthesis Summary

3 Outlines Motivation Private Verification – differentially private regression diagnostics Future work (ongoing): private verification on counting queries for data dependent algorithms Future work (idea): private data synthesis Summary

4 Data Privacy

5 Differential Privacy Definition 1 : ε-Differential Privacy
A randomized algorithm M satisfies ε-Differential Privacy if for any two neighboring datasets D1 and D2, any output S, [C.Dwork etc. ICALP 2006]

6 Differential Privacy Property 1 (Sequential Composition)
M1 and M2 satisfy ε1 and ε2-differential privacy. Releasing the results of both M1(D) and M2(D) will satisfy (ε1+ε2)-differential privacy. Property 2 (Parallel Composition) If D1, D2 are subsets of D and D1∩D2 = Φ. Then releasing M1(D1) and M2(D2) will satisfy max(ε1,ε2)-differential privacy. Property 3 (Post-processing) If M3 is any algorithm, releasing M3(M1(D)) will still ε1-differential privacy.

7 Laplace Mechanism Definition 2 : Laplace Mechanism
For any function f: D -> R^n, the Laplace Mechanism M: M(D) = f(D) + η. η is a vector of independent random variables drawn from a Laplace distribution with parameter = Δ(f) / ε. Δ(f): global sensitivity of f [C.Dwork etc. ICALP 2006]

8 Private Data Management Framework
Data Curator Data Synthesizer Querier Verifier

9 Framework - Open Questions
Differentially Private Algorithms for private verification on different tasks Protection for Data Synthesis

10 Outlines Motivation Private Verification – differentially private regression diagnostics Future work (ongoing): private verification on counting queries for data dependent algorithms Future work (idea): private data synthesis Summary

11 Differentially Private Regression Diagnostics
Generate Model Evaluate Model (Regression Diagnostics) Algorithms for linear/logistic regression while ensuring privacy No privacy-preserving techniques for regression diagnostics

12 Differentially Private Regression Diagnostics
PriRP – Residual Plot (an error measure for linear regression) PriROC – ROC curve (an error measure for logistic regression)

13 Residual Plot Linear Regression models the outcome:
Suppose b is the estimate model, the residual of each point: Residual Plot: residuals v.s. predicted values

14 Residual Plot

15 Private Residual Plot - PriRP
Private Bounds Computation Residual Plots Perturbation

16 Private Residual Plot - PriRP
Private Bounds Computation Real bounds contain sensitive info of data The sensitivity of the bound is infinity. Q: Identify the bounds (-b,b) such that at least θ fraction of the points are contained in (-b,b) with high probability? SVT based algorithm [C. Dwork 14] qi : how many points within the bound (-u*2^i, u*2^i) ?

17 Private Residual Plot - PriRP
Residual Plots Perturbation Q: Estimate 2D probability density inside a bounded region? 1. Discretization 2. Perturbation 3. Sampling

18 Private Residual Plot - PriRP
Empirical Evaluation (data scale = 5000)

19 Private Residual Plot - PriRP
Empirical Evaluation Define similarity between real RP and perturbed RP: Discretize the bound of real RP into 10*10 equal-width grid cells Compute the distribution of residuals among all grids cells c in real RP and perturbed RP, denoted as P(c) and P’(c)

20 Private Residual Plot - PriRP
Empirical Evaluation

21 ROC curve

22 ROC curve ROC curve: TPR v.s. FPR in terms of all possible θ
AUC: area under the curve

23 Private ROC Curve - PriROC
Choosing Thresholds Computing TPRs and FPRs Ensuring Monotonicity

24 Private ROC Curve - PriROC
Choosing Thresholds 1. data independent strategy: fix |Θ| = N+1, Θ = {0,1/N,…,N-1/N,1} Problem: Bad for the skewed predictions 2. data dependent strategy: Ideas: iteratively choose thresholds evenly dividing the data => iteratively finding medians (as thresholds) (smooth sensitivity & deal with invalid thresholds)

25 Private ROC Curve - PriROC
Computing TPRs and FPRs Compute TPRs from computing prefix range queries on Similarly for computing FPRs

26 Private ROC Curve - PriROC
Ensuring Monotonicity To ensure monotonicity, applying method from [Hay. VLDB 10]

27 Private ROC Curve - PriROC
Empirical Evaluation

28 Private ROC Curve - PriROC
Empirical Evaluation AUC Symmetric Difference

29 Outlines Motivation Private Verification – differentially private regression diagnostics Future work (ongoing): private verification on counting queries for data dependent algorithms Future work (idea): private data synthesis Summary

30 Future Work - Verification
Counting queries 1. Data Independent Algorithms (easy) e.g. Laplace Mechanism 2. Data Dependent Algorithms (hard) err is data dependent

31 Future Work - Verification
Definition: Sensitivity of Randomized Algorithm For any randomized algorithm A: D -> R with random variable stream N, we say the randomized algorithm A has sensitivity Δ, if for any two neighboring datasets D and D’, any fixed values of N, Theorem: If randomized algorithm A has sensitivity Δ, then satisfies ε-differential privacy and

32 Future Work - Verification
Another interesting problem: Given an error bound, offer the output only when its error is bounded by the error bound w.h.p.

33 Outlines Motivation Private Verification – differentially private regression diagnostics Future work (ongoing): private verification on counting queries for data dependent algorithms Future work (idea): private data synthesis Summary

34 Future Work - Data Synthesis
Queries on the synthetic data release the information of the synthetic data. Differentially Private Data Synthesis good in terms of the privacy for the whole system, but too much noise Weaker privacy definition? Data synthesis process should be protected

35 Future Work - Data Synthesis
What kind of weaker privacy definition we can use for generating synthetic data? Can the chosen weaker privacy definition composed with differential privacy? How the whole system is protected? Even if the weaker privacy definition is composed with differential privacy, what is the tightest composition result? More complex data synthesis algorithms: Can we empirically evaluate what they protect?

36 Outlines Motivation Private Verification – differentially private regression diagnostics Future work (ongoing): private verification on counting queries for data dependent algorithms Future work (idea): private data synthesis Summary

37 Summary We present the framework for private data management with verification and propose some open questions We start with query verification on differentially private regression diagnostics. We propose the first differentially private algorithms PriRP (for linear regression) and PriROC (for logistic regression) We present our initial works on verification of data dependent algorithms for counting queries. We briefly show the idea of private data synthesis as another future direction.


Download ppt "Private Data Management with Verification"

Similar presentations


Ads by Google