Presentation is loading. Please wait.

Presentation is loading. Please wait.

Foundations of Privacy Lecture 11 Lecturer: Moni Naor.

Similar presentations


Presentation on theme: "Foundations of Privacy Lecture 11 Lecturer: Moni Naor."— Presentation transcript:

1 Foundations of Privacy Lecture 11 Lecturer: Moni Naor

2 Recap of recent lecture Continual changing data –Counters –How to combine expert advice –Multi-counter and the list update problem Pan Privacy General Transformation to continual output

3 The Dynamic Privacy Zoo Differentially Private Outputs Privacy under Continual Observation Pan Privacy User level Privacy Continual Pan Privacy Petting Sketch vs. Stream

4 Sanitization Can’t be Too Accurate Usual counting queries –Query: q µ [n] –  i  2 q d i Response = Answer + noise Blatant Non-Privacy: Adversary Guesses 99% bits Theorem : If all responses are within o(n) of the true answer, then the algorithm is blatantly non-private. But: require exponential # of queries. 4

5 Proof: Exponential Adversary Focus on Column Containing Super Private Bit Assume all answers are within error bound . 5 “ The database ” d 0 1 1 1 1 0 0 Will show that  cannot be o(n)

6 Proof: Exponential Adversary Estimate # 1 ’s in all possible sets – 8 S µ [n] : | K (S) –  i 2 S d i | ≤  Weed Out “Distant” DBs –For each possible candidate database c : If for any S µ [n] : |  i 2 S c i – K (S)| > , then rule out c. –If c not ruled out, halt and output c Claim : Real database d won’t be ruled out 6 K (S) real answer on S

7 Proof: Exponential Adversary Suppose: 8 S µ [n] : |K(S) –  i 2 S d i | ≤  Claim : For c that has not been ruled out Hamming distance (c,d) ≤ 2  0 1 1 S0S0 S1S1 d c 0 0 0 1 0 1 1 ≤ 2  | K(S 0 ) -  i 2 S 0 c i | ≤  ( c not ruled out) |K(S 1 ) -  i 2 S 1 c i | ≤  ( c not ruled out)

8 Contradiction? We have seen algorithms that allow answer each query with accuracy o(n) – O(√n) and O(n 2/3 ) Why is there no contradiction with current results

9 What can we do efficiently ? Allowed “too” much power to the adversary Number of queries Computation On the other hand: lack of wild errors in the responses Theorem : For any sanitization algorithm: If all responses are within o(√n) of the true answer, then it is blatantly non-private even against a polynomial time adversary making O(n log 2 n) random queries. Show the adversary

10 The model As before: database d is a bit string of length n. Users query for subset sums : –A query is a subset q µ {1, …, n} –The (exact) answer is a q =  i 2 q d i  -perturbation –for an answer: a q ±  Slide 10

11 Privacy requires Ω(√n) perturbation Consider a database with o(√n) perturbation Adversary makes t = n log 2 n random queries q j, getting noisy answers a j Privacy violating Algorithm : Construct database c = {c i } 1 ≤ i ≤ n by solving Linear Program: 0 ≤ c i ≤ 1 for 1 ≤ i ≤ n a j -  ≤  i 2 q c i ≤ a j +  for 1 ≤ j ≤ t Round the solution: – if c i > 1/2 set to 1 and to 0 otherwise A solution must exist: d itself For every query q j : its answer according to c is at most 2  far from its (real) answer in d.

12 Bad solutions to LP do not survive A query disqualifies a potential database c if its answer for the query is more than 2  + 1 far from its real answer in d. Idea: show that for a database c that is far away from d a random query disqualifies c with some constant probability  Want to use the Union Bound : all far away solutions are disqualified w.p. at least 1 – n n (1 -  ) t = 1–neg(n) How do we limit the solution space? Round each one value to closest 1/n

13 Privacy requires Ω(√n) perturbation A query disqualifies a potential database c if its answer for the query is more than 2  + 1 far from its real answer in d. Claim : a random query disqualifies far away from d database c with some constant probability  Therefore: t = n log 2 n queries leave a negligible probability for each far reconstruction. Union bound : all far away suggestions are disqualified w.p. at least 1 – n n (1 -  ) t = 1 – neg(n) Can apply union bound by discretization Count number of entries far from d

14 Review and Conclusion When the perturbation is o(√n), choosing Õ(n) random queries gives enough information to efficiently reconstruct an o(n) -close db. Database reconstructed using Linear programming – polynomial time. Slide 14 o(√n) databases are Blatantly Non-Private. poly(n) time reconstructable

15 Ω(√n) lower bound revisited An attack on a o(√n)- perturbation database with substantially better performance Previous attack uses n log 2 n queries and runs in n 5 log 4 n time (LP) New attack: issues n queries and runs in O(nlog n) time New attack is deterministic –Fixed set of queries for each size –Not necessarily an advantage – must ask certain queries Slide 15

16 The Fourier Attack Treat the database d as a function Z 2 logn → Z 2 Query specific subset sums: from which the Fourier coefficients of the function can be calculated –One for each Fourier coefficient Round reconstructed function’s values to bits When the sums have o(√n) error, so do the coefficients –the reconstruction can be shown to have o(n) error. Fourier transform can be computed in time O(n log n) Slide 16 Key point: linearity of Fourier transform implies small error in coefficients also mean small error in function Vector defines a functi on

17 Fourier Transform The characters of Z 2 k : homomorphisms into {-1,1} There are 2 k characters : one for each a=(a 1, a 2, …, a k ) 2 Z 2 k  a (x) = (-1)  i=1 a i x i For function f: Z 2 logn → R The Fourier coefficients f(  a ) are  x  a (x) f(x) We have: f(x) =  a  a (x) f(  a ) Æ Æ k H = 2 k x 2 k Hadamard matrix H H = 2 k I f = H f f = 1/2 k H f H a,b =  a (b) Æ Æ

18 Parseval’s Identity Relates the absolute values of f to absolute values of Fourier coefficients of f  x 2 Z 2 k |f(x)| 2 = 1/2 k  a 2 Z 2 k |f(  a )| 2 Æ

19 Evaluating Fourier Coefficients with Counting queries Let  0 =  x f(x) For a=(a 1, a 2, …, a k ) let S a = {x| =0 mod 2} f(  a ) = 2  x 2 S a f(x) -  0 Approximation of counting query on S a yields approximation of f(  a ) with related term f = 1/2 k H f => 1/2 k H (f + e) = f + 1/2 k He |S a |= 2 k-1 Æ Æ Æ e : error vector of Fourier co. Æ e=(e 1, e 2, …, e n )

20 f = 1/2 k H f => 1/2 k H (f + e) = f + 1/2 k He If 1/2 k He has  (n) entries which are ¸ ½ Then by Parseval’s: 1/2 k  a 2 Z 2 k |e a | 2 is  (n) Hence: at least one |e a | is  (√n) ÆÆ n e : error vector of Fourier co. e=(e 1, e 2, …, e n )  x 2 Z 2 k |f(x)| 2 = 1/2 k  a 2 Z 2 k |f(  a )| 2 Contradicting assumption on accuracy

21 Changing the Model: weighted counting Previous attacks: assume all queries are within some small perturbation  New model: To up to ½-  of the queries unbounded noise is added To the rest “small” noise  bounded Stronger query model : subset sums are weighted with weights 0...p-1 for Slide 21 Cannot “hide” single bits: all the weight might be there some prime p = Ω(1/  2 +  /  ) Want some randomness of queries – otherwise repetition

22 Interpolation attack Treat database as linear form of n variables over Z p Treat a query q = (q 1, …, q n ) as the evaluation of the form at a point f(q 1, …, q n ) = Σ i=1..n d i q i mod p –An answer to query q =((p-1)/2, 0, …, 0) that is within (p-1)/4 error tells us the first db bit –Similarly to all other bits No point in asking the query directly: these useful queries might have unbounded noise Need to deduce (approximate) answer to q from other queries Slide 22 By dropping info

23 Interpolation attack - implementation Want to evaluate a specific query q with small error Pick a random degree-2 curve that passes through q and issue queries for the p points on the curve Key issue: points on curve are pairwise independent Therefore: for sufficiently many queries, with high probability interpolation gives a correct (up to small noise) answer for q Can try exhaustively all degree 2 polynomials Slide 23 Similar to Reed Muller decoding

24 Interpolation attack … Interpolation implemented by searching all p 3 degree 2 polynomials for one which is  -close at ½-  of the entries polynomial –restrictions of a deg-2 curve to a linear form is a deg-2 polynomial Any two such polynomials must be 2  -close, due to low degree Hence the accuracy of the reconstructed answer is 2 . For (p-1)/4 > 2  : can figure out any specific database bit with high probability Slide 24 To query

25 Interpolation Attack: evaluating a query accurately DB: f(q 1, …, q n ) = Σ i=1..n d i q i (Z p n → Z p ) Pick a curve: for two random points u 1, u 2 in Z p n : c(t) = q + u 1 t + u 2 t 2 (Z p → Z p n ) Restriction of f to c : f| c (t) = f(c(t)) this is a degree-2 polynomial ( Z p → Z p ) Query all p points of c to get evaluations of f| c –answers are inaccurate Interpolate to find f| c up to a small error Evaluate f| c (0) = f(q) accurately Slide 25

26 Interpolation attack - performance Time for finding any specific bit: O(p 4 )=O(  -8 ) Independent of db size n ? (querying time? |q| = Θ( n )) –Can be used with very large databases if interesting part is small Time to construct whole db with small error: O(n) with pn queries (or O( n 2 )) Slide 26

27 Summary Ω(√ n ) perturbation lower bound revisited – simple and efficient attack When queries allow sufficiently large weights, an adversary can: –Handle unbounded noise on large portion of the queries –Find out private data in time independent of size of DB Slide 27


Download ppt "Foundations of Privacy Lecture 11 Lecturer: Moni Naor."

Similar presentations


Ads by Google