Foundations of Privacy Lecture 11 Lecturer: Moni Naor.

Slides:



Advertisements
Similar presentations
Estimating Distinct Elements, Optimally
Advertisements

Quantum Lower Bounds The Polynomial and Adversary Methods Scott Aaronson September 14, 2001 Prelim Exam Talk.
Ulams Game and Universal Communications Using Feedback Ofer Shayevitz June 2006.
Shortest Vector In A Lattice is NP-Hard to approximate
Approximate List- Decoding and Hardness Amplification Valentine Kabanets (SFU) joint work with Russell Impagliazzo and Ragesh Jaiswal (UCSD)
Incremental Linear Programming Linear programming involves finding a solution to the constraints, one that maximizes the given linear function of variables.
Foundations of Cryptography Lecture 10 Lecturer: Moni Naor.
296.3Page :Algorithms in the Real World Error Correcting Codes II – Cyclic Codes – Reed-Solomon Codes.
15-853:Algorithms in the Real World
Foundations of Cryptography Lecture 5 Lecturer: Moni Naor.
Foundations of Cryptography Lecture 4 Lecturer: Moni Naor.
Theoretical Program Checking Greg Bronevetsky. Background The field of Program Checking is about 13 years old. Pioneered by Manuel Blum, Hal Wasserman,
Computability and Complexity 20-1 Computability and Complexity Andrei Bulatov Random Sources.
Seminar in Foundations of Privacy 1.Adding Consistency to Differential Privacy 2.Attacks on Anonymized Social Networks Inbal Talgam March 2008.
CS151 Complexity Theory Lecture 7 April 20, 2004.
Tirgul 10 Rehearsal about Universal Hashing Solving two problems from theoretical exercises: –T2 q. 1 –T3 q. 2.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 12 June 18, 2006
CPSC 689: Discrete Algorithms for Mobile and Wireless Systems Spring 2009 Prof. Jennifer Welch.
Revealing Information while Preserving Privacy Kobbi Nissim NEC Labs, DIMACS Based on work with: Irit DinurCynthia Dwork and Joe Kilian Irit Dinur, Cynthia.
Point and Confidence Interval Estimation of a Population Proportion, p
The Goldreich-Levin Theorem: List-decoding the Hadamard code
Oded Regev Tel-Aviv University On Lattices, Learning with Errors, Learning with Errors, Random Linear Codes, Random Linear Codes, and Cryptography and.
Private Information Retrieval. What is Private Information retrieval (PIR) ? Reduction from Private Information Retrieval (PIR) to Smooth Codes Constructions.
Foundations of Privacy Lecture 7 Lecturer: Moni Naor.
Code and Decoder Design of LDPC Codes for Gbps Systems Jeremy Thorpe Presented to: Microsoft Research
1. 2 Overview Some basic math Error correcting codes Low degree polynomials Introduction to consistent readers and consistency tests H.W.
Locally Decodable Codes Uri Nadav. Contents What is Locally Decodable Code (LDC) ? Constructions Lower Bounds Reduction from Private Information Retrieval.
Preference Analysis Joachim Giesen and Eva Schuberth May 24, 2006.
CS151 Complexity Theory Lecture 10 April 29, 2004.
6 6.3 © 2012 Pearson Education, Inc. Orthogonality and Least Squares ORTHOGONAL PROJECTIONS.
Quantum Counters Smita Krishnaswamy Igor L. Markov John P. Hayes.
CS151 Complexity Theory Lecture 9 April 27, 2004.
Foundations of Cryptography Lecture 9 Lecturer: Moni Naor.
Games, Hats, and Codes Mira Bernstein Wellesley College SUMS 2005.
Foundations of Cryptography Lecture 2 Lecturer: Moni Naor.
How Robust are Linear Sketches to Adaptive Inputs? Moritz Hardt, David P. Woodruff IBM Research Almaden.
Dana Moshkovitz, MIT Joint work with Subhash Khot, NYU.
Multiplicative Weights Algorithms CompSci Instructor: Ashwin Machanavajjhala 1Lecture 13 : Fall 12.
EM and expected complete log-likelihood Mixture of Experts
Ragesh Jaiswal Indian Institute of Technology Delhi Threshold Direct Product Theorems: a survey.
Quantum Computing MAS 725 Hartmut Klauck NTU TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A.
Disclosure risk when responding to queries with deterministic guarantees Krish Muralidhar University of Kentucky Rathindra Sarathy Oklahoma State University.
PRIMES is in P Manindra Agrawal NUS Singapore / IIT Kanpur.
MA/CS 375 Fall MA/CS 375 Fall 2002 Lecture 31.
Week 10Complexity of Algorithms1 Hard Computational Problems Some computational problems are hard Despite a numerous attempts we do not know any efficient.
Secure Computation (Lecture 5) Arpita Patra. Recap >> Scope of MPC > models of computation > network models > modelling distrust (centralized/decentralized.
Personalized Social Recommendations – Accurate or Private? A. Machanavajjhala (Yahoo!), with A. Korolova (Stanford), A. Das Sarma (Google) 1.
CS717 Algorithm-Based Fault Tolerance Matrix Multiplication Greg Bronevetsky.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Foundations of Privacy Lecture 5 Lecturer: Moni Naor.
Fall 2013 CMU CS Computational Complexity Lectures 8-9 Randomness, communication, complexity of unique solutions These slides are mostly a resequencing.
1 CSC 421: Algorithm Design & Analysis Spring 2014 Complexity & lower bounds  brute force  decision trees  adversary arguments  problem reduction.
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
Lower bounds on data stream computations Seminar in Communication Complexity By Michael Umansky Instructor: Ronitt Rubinfeld.
R. Kass/W03 P416 Lecture 5 l Suppose we are trying to measure the true value of some quantity (x T ). u We make repeated measurements of this quantity.
Approximation Algorithms based on linear programming.
Sergey Yekhanin Institute for Advanced Study Lower Bounds on Noise.
Umans Complexity Theory Lectures Lecture 9b: Pseudo-Random Generators (PRGs) for BPP: - Hardness vs. randomness - Nisan-Wigderson (NW) Pseudo- Random Generator.
Linear, Nonlinear, and Weakly-Private Secret Sharing Schemes
RS – Reed Solomon Error correcting code. Error-correcting codes are clever ways of representing data so that one can recover the original information.
Theory of Computational Complexity Probability and Computing Chapter Hikaru Inada Iwama and Ito lab M1.
Chapter 7. Classification and Prediction
On the Size of Pairing-based Non-interactive Arguments
Streaming & sampling.
Digital Signature Schemes and the Random Oracle Model
COMS E F15 Lecture 2: Median trick + Chernoff, Distinct Count, Impossibility Results Left to the title, a presenter can insert his/her own image.
RS – Reed Solomon List Decoding.
Locally Decodable Codes from Lifting
The Curve Merger (Dvir & Widgerson, 2008)
Lecture 6: Counting triangles Dynamic graphs & sampling
Presentation transcript:

Foundations of Privacy Lecture 11 Lecturer: Moni Naor

Recap of recent lecture Continual changing data –Counters –How to combine expert advice –Multi-counter and the list update problem Pan Privacy General Transformation to continual output

The Dynamic Privacy Zoo Differentially Private Outputs Privacy under Continual Observation Pan Privacy User level Privacy Continual Pan Privacy Petting Sketch vs. Stream

Sanitization Can’t be Too Accurate Usual counting queries –Query: q µ [n] –  i  2 q d i Response = Answer + noise Blatant Non-Privacy: Adversary Guesses 99% bits Theorem : If all responses are within o(n) of the true answer, then the algorithm is blatantly non-private. But: require exponential # of queries. 4

Proof: Exponential Adversary Focus on Column Containing Super Private Bit Assume all answers are within error bound . 5 “ The database ” d Will show that  cannot be o(n)

Proof: Exponential Adversary Estimate # 1 ’s in all possible sets – 8 S µ [n] : | K (S) –  i 2 S d i | ≤  Weed Out “Distant” DBs –For each possible candidate database c : If for any S µ [n] : |  i 2 S c i – K (S)| > , then rule out c. –If c not ruled out, halt and output c Claim : Real database d won’t be ruled out 6 K (S) real answer on S

Proof: Exponential Adversary Suppose: 8 S µ [n] : |K(S) –  i 2 S d i | ≤  Claim : For c that has not been ruled out Hamming distance (c,d) ≤ 2  S0S0 S1S1 d c ≤ 2  | K(S 0 ) -  i 2 S 0 c i | ≤  ( c not ruled out) |K(S 1 ) -  i 2 S 1 c i | ≤  ( c not ruled out)

Contradiction? We have seen algorithms that allow answer each query with accuracy o(n) – O(√n) and O(n 2/3 ) Why is there no contradiction with current results

What can we do efficiently ? Allowed “too” much power to the adversary Number of queries Computation On the other hand: lack of wild errors in the responses Theorem : For any sanitization algorithm: If all responses are within o(√n) of the true answer, then it is blatantly non-private even against a polynomial time adversary making O(n log 2 n) random queries. Show the adversary

The model As before: database d is a bit string of length n. Users query for subset sums : –A query is a subset q µ {1, …, n} –The (exact) answer is a q =  i 2 q d i  -perturbation –for an answer: a q ±  Slide 10

Privacy requires Ω(√n) perturbation Consider a database with o(√n) perturbation Adversary makes t = n log 2 n random queries q j, getting noisy answers a j Privacy violating Algorithm : Construct database c = {c i } 1 ≤ i ≤ n by solving Linear Program: 0 ≤ c i ≤ 1 for 1 ≤ i ≤ n a j -  ≤  i 2 q c i ≤ a j +  for 1 ≤ j ≤ t Round the solution: – if c i > 1/2 set to 1 and to 0 otherwise A solution must exist: d itself For every query q j : its answer according to c is at most 2  far from its (real) answer in d.

Bad solutions to LP do not survive A query disqualifies a potential database c if its answer for the query is more than 2  + 1 far from its real answer in d. Idea: show that for a database c that is far away from d a random query disqualifies c with some constant probability  Want to use the Union Bound : all far away solutions are disqualified w.p. at least 1 – n n (1 -  ) t = 1–neg(n) How do we limit the solution space? Round each one value to closest 1/n

Privacy requires Ω(√n) perturbation A query disqualifies a potential database c if its answer for the query is more than 2  + 1 far from its real answer in d. Claim : a random query disqualifies far away from d database c with some constant probability  Therefore: t = n log 2 n queries leave a negligible probability for each far reconstruction. Union bound : all far away suggestions are disqualified w.p. at least 1 – n n (1 -  ) t = 1 – neg(n) Can apply union bound by discretization Count number of entries far from d

Review and Conclusion When the perturbation is o(√n), choosing Õ(n) random queries gives enough information to efficiently reconstruct an o(n) -close db. Database reconstructed using Linear programming – polynomial time. Slide 14 o(√n) databases are Blatantly Non-Private. poly(n) time reconstructable

Ω(√n) lower bound revisited An attack on a o(√n)- perturbation database with substantially better performance Previous attack uses n log 2 n queries and runs in n 5 log 4 n time (LP) New attack: issues n queries and runs in O(nlog n) time New attack is deterministic –Fixed set of queries for each size –Not necessarily an advantage – must ask certain queries Slide 15

The Fourier Attack Treat the database d as a function Z 2 logn → Z 2 Query specific subset sums: from which the Fourier coefficients of the function can be calculated –One for each Fourier coefficient Round reconstructed function’s values to bits When the sums have o(√n) error, so do the coefficients –the reconstruction can be shown to have o(n) error. Fourier transform can be computed in time O(n log n) Slide 16 Key point: linearity of Fourier transform implies small error in coefficients also mean small error in function Vector defines a functi on

Fourier Transform The characters of Z 2 k : homomorphisms into {-1,1} There are 2 k characters : one for each a=(a 1, a 2, …, a k ) 2 Z 2 k  a (x) = (-1)  i=1 a i x i For function f: Z 2 logn → R The Fourier coefficients f(  a ) are  x  a (x) f(x) We have: f(x) =  a  a (x) f(  a ) Æ Æ k H = 2 k x 2 k Hadamard matrix H H = 2 k I f = H f f = 1/2 k H f H a,b =  a (b) Æ Æ

Parseval’s Identity Relates the absolute values of f to absolute values of Fourier coefficients of f  x 2 Z 2 k |f(x)| 2 = 1/2 k  a 2 Z 2 k |f(  a )| 2 Æ

Evaluating Fourier Coefficients with Counting queries Let  0 =  x f(x) For a=(a 1, a 2, …, a k ) let S a = {x| =0 mod 2} f(  a ) = 2  x 2 S a f(x) -  0 Approximation of counting query on S a yields approximation of f(  a ) with related term f = 1/2 k H f => 1/2 k H (f + e) = f + 1/2 k He |S a |= 2 k-1 Æ Æ Æ e : error vector of Fourier co. Æ e=(e 1, e 2, …, e n )

f = 1/2 k H f => 1/2 k H (f + e) = f + 1/2 k He If 1/2 k He has  (n) entries which are ¸ ½ Then by Parseval’s: 1/2 k  a 2 Z 2 k |e a | 2 is  (n) Hence: at least one |e a | is  (√n) ÆÆ n e : error vector of Fourier co. e=(e 1, e 2, …, e n )  x 2 Z 2 k |f(x)| 2 = 1/2 k  a 2 Z 2 k |f(  a )| 2 Contradicting assumption on accuracy

Changing the Model: weighted counting Previous attacks: assume all queries are within some small perturbation  New model: To up to ½-  of the queries unbounded noise is added To the rest “small” noise  bounded Stronger query model : subset sums are weighted with weights 0...p-1 for Slide 21 Cannot “hide” single bits: all the weight might be there some prime p = Ω(1/  2 +  /  ) Want some randomness of queries – otherwise repetition

Interpolation attack Treat database as linear form of n variables over Z p Treat a query q = (q 1, …, q n ) as the evaluation of the form at a point f(q 1, …, q n ) = Σ i=1..n d i q i mod p –An answer to query q =((p-1)/2, 0, …, 0) that is within (p-1)/4 error tells us the first db bit –Similarly to all other bits No point in asking the query directly: these useful queries might have unbounded noise Need to deduce (approximate) answer to q from other queries Slide 22 By dropping info

Interpolation attack - implementation Want to evaluate a specific query q with small error Pick a random degree-2 curve that passes through q and issue queries for the p points on the curve Key issue: points on curve are pairwise independent Therefore: for sufficiently many queries, with high probability interpolation gives a correct (up to small noise) answer for q Can try exhaustively all degree 2 polynomials Slide 23 Similar to Reed Muller decoding

Interpolation attack … Interpolation implemented by searching all p 3 degree 2 polynomials for one which is  -close at ½-  of the entries polynomial –restrictions of a deg-2 curve to a linear form is a deg-2 polynomial Any two such polynomials must be 2  -close, due to low degree Hence the accuracy of the reconstructed answer is 2 . For (p-1)/4 > 2  : can figure out any specific database bit with high probability Slide 24 To query

Interpolation Attack: evaluating a query accurately DB: f(q 1, …, q n ) = Σ i=1..n d i q i (Z p n → Z p ) Pick a curve: for two random points u 1, u 2 in Z p n : c(t) = q + u 1 t + u 2 t 2 (Z p → Z p n ) Restriction of f to c : f| c (t) = f(c(t)) this is a degree-2 polynomial ( Z p → Z p ) Query all p points of c to get evaluations of f| c –answers are inaccurate Interpolate to find f| c up to a small error Evaluate f| c (0) = f(q) accurately Slide 25

Interpolation attack - performance Time for finding any specific bit: O(p 4 )=O(  -8 ) Independent of db size n ? (querying time? |q| = Θ( n )) –Can be used with very large databases if interesting part is small Time to construct whole db with small error: O(n) with pn queries (or O( n 2 )) Slide 26

Summary Ω(√ n ) perturbation lower bound revisited – simple and efficient attack When queries allow sufficiently large weights, an adversary can: –Handle unbounded noise on large portion of the queries –Find out private data in time independent of size of DB Slide 27