Presentation is loading. Please wait.

Presentation is loading. Please wait.

Foundations of Privacy Lecture 5 Lecturer: Moni Naor.

Similar presentations


Presentation on theme: "Foundations of Privacy Lecture 5 Lecturer: Moni Naor."— Presentation transcript:

1 Foundations of Privacy Lecture 5 Lecturer: Moni Naor

2 Desirable Properties from a sanitization mechanism Composability –Applying the sanitization several time yields a graceful degradation –Will see: t releases, each  -DP, are t ¢  - DP –Next class: (√t  +t  2,  )- DP (roughly) Robustness to side information –No need to specify exactly what the adversary knows: –knows everything except one row Differential Privacy: satisfies both…

3 Differential Privacy Protect individual participants: Curator/ Sanitizer M Curator/ Sanitizer M + Dwork, McSherry Nissim & Smith 2006 D2D2 D1D1

4 Adjacency: D+I and D-I Differential Privacy Protect individual participants: Probability of every bad event - or any event - increases only by small multiplicative factor when I enter the DB. May as well participate in DB… ε -differentially private sanitizer M For all DBs D, all individuals I and all events T Pr A [M(D+I) 2 T] Pr A [M(D-I) 2 T] ≤ e ε ≈ 1+ ε e-ε ≤e-ε ≤ Handles aux input

5 Differential Privacy (Bad) Responses: ZZZ Pr [response] ratio bounded Sanitizer M gives  - differential privacy if: for all adjacent D 1 and D 2, and all A µ range(M): Pr[ M (D 1 ) 2 A] ≤ e  Pr[ M (D 2 ) 2 A] Participation in the data set poses no additional risk Differing in one user

6 Example of Differential Privacy X is a set of (name,tag 2 {0,1}) tuples One query: #of participants with tag=1 Sanitizer : output #of 1’s + noise noise from Laplace distribution with parameter 1/ε Pr[noise = k-1] ≈ e ε Pr[noise=k] 0 12345-2-3-4 0 1234 5 -2-3-4

7 ( ,  ) - Differential Privacy Bad Responses: Z Z Z Pr [response] ratio bounded This course :  negligible Sanitizer M gives ( ,  ) - differential privacy if: for all adjacent D 1 and D 2, and all A µ range(M): Pr[ M (D 1 ) 2 A] ≤ e  Pr[ M (D 2 ) 2 A] + 

8 Example: NO Differential Privacy U set of (name,tag 2 {0,1}) tuples One counting query: #of participants with tag=1 Sanitizer A : choose and release a few random tags Bad event T : Only my tag is 1, my tag released Pr A [A(D+Me) 2 T] ≥ 1/n Pr A [A(D-Me) 2 T] = 0 Pr A [A(D+Me) 2 T] Pr A [A(D-Me) 2 T] ≤ e ε ≈ 1+ ε e-ε ≤e-ε ≤ Not ε diff private for any ε ! It is (0,1/n) Differential Private

9 Counting Queries Counting-queries U Q is a set of predicates q: U  {0,1} Query : how many x participants satisfy q? Relaxed accuracy: answer query within α additive error w.h.p Not so bad: some error anyway inherent in statistical analysis U Database x of size n Query q n individuals, each contributing U a single point in U Sometimes talk about fraction

10 Bound on Achievable Privacy Want to get bounds on the Accuracy –The responses from the mechanism to all queries are assured to be within α except with probability  Number of queries t for which we can receive accurate answers The privacy parameter ε for which ε differential privacy is achievable –Or ( ε,  ) differential privacy is achievable

11 Blatant Non Privacy Mechanism M is Blatantly Non-Private if there is an adversary A that On any database D of size n can select queries and use the responses M(D) to reconstruct D’ such that ||D-D’|| 1 2 o(n) D’ agrees with D in all but o(n) of the entries. Claim : Blatant non privacy implies that M is not ( ,  ) -DP for any constant 

12 Sanitization Can’t be Too Accurate Usual counting queries –Query: q µ [n] –  i  2 q d i Response = Answer + noise Blatant Non-Privacy: Adversary Guesses 99% bits Theorem : If all responses are within o(n) of the true answer, then the algorithm is blatantly non-private. But: require exponential # of queries. 12

13 Proof: Exponential Adversary Focus on Column Containing Super Private Bit Assume all answers are within error bound . 13 “ The database ” Vector d 2 {0,1} n 0 1 1 1 1 0 0 Will show that  cannot be o(n)

14 Proof: Exponential Adversary for Blatant Non Privacy Estimate # 1 ’s in all possible sets – 8 S µ [n] : | M (S) –  i 2 S d i | ≤  Weed Out “Distant” DBs –For each possible candidate database c 2 {0,1} n : If for any S µ [n] : |  i 2 S c i – M (S)| > , then rule out c. –If c not ruled out, halt and output c Claim : Real database d won’t be ruled out 14 M (S): answer on S

15 Proof: Exponential Adversary Assume : 8 S µ [n] : |M(S) –  i 2 S d i | ≤  Claim : For c that has not been ruled out Hamming distance (c,d) ≤ 2  0 1 1 S0S0 S1S1 d c 1 0 0 1 1 0 1 ≤ 4  |M(S 0 ) -  i 2 S 0 c i | ≤  ( c not ruled out) |M(S 1 ) -  i 2 S 1 c i | ≤  ( c not ruled out) ≤ 2 

16 Impossibility of Exponential Queries The result means that we cannot sanitize the data and publish a data structure so that for all queries the answer can be deduced correctly to within  2 o(n) query 1, query 2,... Database answer 1 answer 3 answer 2 ? Sanitizer On the other hand: we will see that we can get accuracy up to log |Q|

17 What can we do efficiently ? Allowed “too” much power to the adversary Number of queries: exponential Computation: exponential On the other hand: lack of wild errors in the responses Theorem : For any sanitization algorithm: If all responses are within o(√n) of the true answer, then it is blatantly non-private even against a polynomial time adversary making O(n log 2 n) random queries.

18 The Model As before: database d is a bit string of length n. Counting queries : –A query is a subset q µ {1, …, n} –The (exact) answer is a q =  i 2 q d i  -perturbation –for an answer: a q ±  Slide 18

19 What If We Had Exact Answers? Consider a mechanism 0 -perturbations –Receive the exact answer a q =  i 2 q d i Then with n linearly independent queries – over the reals we could reconstruct d precisely: Obtain n linearly equations a q =  i 2 q c i and solve uniquely When we have  -perturbations : get an inequality a j -  ≤  i 2 q c i ≤ a j +  Idea: use linear programming A solution must exist: d itself

20 Privacy requires Ω(√n) perturbation Consider a database with o(√n) perturbation Adversary makes t = n log 2 n random queries q j, getting noisy answers a j Privacy violating Algorithm : Construct database c = {c i } 1 ≤ i ≤ n by solving Linear Program: 0 ≤ c i ≤ 1 for 1 ≤ i ≤ n a j -  ≤  i 2 q c i ≤ a j +  for 1 ≤ j ≤ t Round the solution: – if c i > 1/2 set to 1 and to 0 otherwise A solution must exist: d itself For every query q j : its answer according to c is at most 2  far from its (real) answer in d.

21 Bad solutions to LP do not survive A query q disqualifies a potential database c 2 [0,1] n if its answer on q is more than 2  far answer in d : |  i 2 q c i -  i 2 q d i | > 2  Idea: show that for a database c that is far away from d a random query disqualifies c with some constant probability  Want to use the Union Bound : all far away solutions are disqualified w.p. at least 1 – n n (1 -  ) t = 1–neg(n) How do we limit the solution space? Round each value to closest 1/n

22 Privacy requires Ω(√n) perturbation A query q disqualifies a potential database c 2 [0,1] n if its answer on q is more than 2  far answer in d : Lemma : if c is far away from d, then a random query disqualifies c with some constant probability  If Prob i 2 [n] [|d i -c i | ¸ 1/3] > , then there is a  >0 such that Prob q 2 {0,1} [n] [|  i 2 q (c i – d i )| ¸ 2  +1] >  Proof uses Azuma’s inequality

23 Privacy requires Ω(√n) perturbation Can discretize all potential databases c 2 [0,1] n : Suppose we round each entry c i to closest fraction with denominator n: |c i – w i /n| · 1/n The response on q change by at most 1. If we disqualify all `discrete’ databases then we also effectively eliminate all c 2 [0,1] n There are n n `discrete’ databases

24 Privacy requires Ω(√n) perturbation A query q disqualifies a potential database c 2 [0,1] n if its answer on q is more than 2  far answer in d : Claim :if c is far away from d, then a random query disqualifies c with some constant probability  Therefore: t = n log 2 n queries leave a negligible probability for each far away reconstruction. Union bound : all far away suggestions are disqualified w.p. at least 1 – n n (1 -  ) t = 1 – neg(n) Can apply union bound by discretization Count number of entries far from d

25 Review and Conclusion When the perturbation is o(√n), choosing Õ(n) random queries gives enough information to efficiently reconstruct an o(n) -close db. Database reconstructed using Linear programming – polynomial time. Slide 25 o(√n) databases are Blatantly Non-Private. poly(n) time reconstructable

26 Composition Suppose we are going to apply a DP mechanism t times. –Perhaps on different databases Want to argue that result is differentially private A value b 2 {0,1} is chosen In each of the t rounds adversary A picks two adjacent databases D 0 i and D 1 i and receives result z i of an  - DP mechanism M i on D b i Want to argue A ‘s view is within  for both values of b A ‘s view: (z 1, z 2, …, z t ) plus randomness used.

27 Differential Privacy: Composition Handles auxiliary information Composes naturally A 1 (D) is ε 1 -diffP for all z 1, A 2 (D,z 1 ) is ε 2 -diffP, Then A 2 (D,A 1 (D)) is (ε 1 +ε 2 ) -diffP Proof: for all adjacent D, D’ and (z 1,z 2 ) : e -ε 1 ≤ P[z 1 ] / P’[z 1 ] ≤ e ε 1 e -ε 2 ≤ P[z 2 ] / P’[z 2 ] ≤ e ε 2 e -(ε 1 +ε 2 ) ≤ P[(z 1,z 2 )]/P’[(z 1,z 2 )] ≤ e ε 1 +ε 2 P[z 1 ] = Pr z~A 1 (D) [z=z 1 ] P’[z 1 ] = Pr z~A 1 (D’) [z=z 1 ] P[z 2 ] = Pr z~A 2 (D,z 1 ) [z=z 2 ] P’[z 2 ] = Pr z~A 2 (D’,z 1 ) [z=z 2 ]

28 Differential Privacy: Composition If all mechanisms M i are  -DP, then for any view the probability that A gets the view when b=0 and when b=1 are with e  t Therefore results for a single query translate to results on several queries

29 Answering a single counting query U set of (name,tag 2 {0,1}) tuples One counting query : #of participants with tag=1 Sanitizer A: output #of 1’s + noise Differentially private! If choose noise properly Choose noise from Laplace distribution

30 0 12345-2-3-4 Laplacian Noise Laplace distribution Y=Lap(b) has density function Pr[Y=y] =1/2b e -|y|/b Standard deviation: O( b) Take b=1/ε, get that Pr[Y=y] Ç e -  |y|

31 Laplacian Noise: ε- Privacy Take b=1/ε, get that Pr[Y=y] Ç e -  |y| Release: q(D) + Lap(1/ε) For adjacent D, D’ : |q(D) – q(D’)| ≤ 1 For output a : e -  ≤ Pr by D [a]/Pr by D’ [a] ≤ e  0 12345-2-3-4

32 Laplacian Noise: ε- Privacy Theorem: the Laplace mechanism with parameter b=1/  is  -differential private 0 12345-2-3-4

33 0 12345-2-3-4 Laplacian Noise: Õ(1/ε)- Error Take b=1/ε, get that Pr[Y=y] Ç e -  |y| Concentration of the Laplace distribution: Pr y~Y [|y| > k·1/ε] = O(e -k ) Setting k=O(log n) Expected error is 1/ε, w.h.p error is Õ(1/ε)


Download ppt "Foundations of Privacy Lecture 5 Lecturer: Moni Naor."

Similar presentations


Ads by Google