Presentation is loading. Please wait.

Presentation is loading. Please wait.

Revealing Information while Preserving Privacy Kobbi Nissim NEC Labs, DIMACS Based on work with: Irit DinurCynthia Dwork and Joe Kilian Irit Dinur, Cynthia.

Similar presentations


Presentation on theme: "Revealing Information while Preserving Privacy Kobbi Nissim NEC Labs, DIMACS Based on work with: Irit DinurCynthia Dwork and Joe Kilian Irit Dinur, Cynthia."— Presentation transcript:

1

2 Revealing Information while Preserving Privacy Kobbi Nissim NEC Labs, DIMACS Based on work with: Irit DinurCynthia Dwork and Joe Kilian Irit Dinur, Cynthia Dwork and Joe Kilian

3 2 The Hospital Story Patient data q?q? a Medical DB

4 3 Easy Tempting Solution Observation: ‘harmless’ attributes uniquely identify many patients (gender, approx age, approx weight, ethnicity, marital status…)Observation: ‘harmless’ attributes uniquely identify many patients (gender, approx age, approx weight, ethnicity, marital status…) Worse:`rare’ attribute (CF  1/3000)Worse:`rare’ attribute (CF  1/3000) dd Mr. Smith Ms. John Mr. Doe A Bad Solution Idea: a. Remove identifying information (name, SSN, …) b. Publish data

5 4 Our Model: Statistical Database (SDB)  {0,1} n d  {0,1} n q  [n]  a q =  i  q d i Mr. Smith Ms. John Mr. Doe

6 5 The Privacy Game: Information-Privacy Tradeoff Private functions:Private functions: –want to hide  i (d 1, …,d n )=d i Information functions:Information functions: –want to revealf q (d 1, …,d n )=  i  q d i Explicit definition of private functionsExplicit definition of private functions Crypto: secure function evaluationCrypto: secure function evaluation –want to reveal f() –want to hide all functions  () not computable from f() –Implicit definition of private functions

7 6 Approaches to SDB Privacy [AW 89] Query RestrictionQuery Restriction –Require queries to obey some structure PerturbationPerturbation –Give `noisy’ or `approximate’ answers This talk

8 7 Perturbation Database: d = d 1,…,d n Database: d = d 1,…,d n Query: q  [n] Query: q  [n] Exact answer: a q =  i  q d i Exact answer: a q =  i  q d i Perturbed answer: â q Perturbed answer: â q Perturbation E: For all q: | â q – a q | ≤ E General Perturbation: Pr q [|â q – a q | ≤ E] = 1-neg(n) = 99%, 51% = 99%, 51%

9 8 Data perturbation: –Swapping [Reiss 84][Liew, Choi, Liew 85] –Fixed perturbations [Traub, Yemini, Wozniakowski 84] [Agrawal, Srikant 00] [Agrawal, Aggarwal 01] Additive perturbation d’ i =d i +E iAdditive perturbation d’ i =d i +E i Output perturbation: –Random sample queries [Denning 80] Sample drawn from query setSample drawn from query set –Varying perturbations [Beck 80] Perturbation variance grows with number of queriesPerturbation variance grows with number of queries –Rounding [Achugbue, Chin 79] Randomized [Fellegi, Phillips 74] … Perturbation Techniques [AW89]

10 9 Main Question: How much perturbation is needed to achieve privacy?

11 10 Privacy from  Perturbation Privacy from   n Perturbation Database: d  R {0,1} n Database: d  R {0,1} n On query q: On query q: 1. Let a q =  i  q d i 2. If |a q -|q|/2| > E return â q = a q 3. Otherwise return â q = |q|/2 Privacy is preserved Privacy is preserved, whp always use rule 3 – If E   n (lgn) 2, whp always use rule 3 No information about d is given! No information about d is given! No usability! Can we do better? Smaller E ? Usability ??? (an example of a useless database)

12 11 Defining Privacy Elusive definitionElusive definition –Application dependent –Partial vs. exact compromise –Prior knowledge, how to model it? –Other issues … Defining Privacy (not) Defining Privacy Instead of defining privacy: What is surely non-private…Instead of defining privacy: What is surely non-private… –Strong breaking of privacy

13 12 Strong Breaking of Privacy The Useless Database Achieves Best Possible Perturbation: Perturbation <<  n Implies no Privacy! Main Theorem: Given a DB response algorithm with perturbation E <<  n, there is a poly- time reconstruction algorithm that outputs a database d’, s.t. dist(d,d’) < o(n).

14 13 d n bits (Recall â q =  i  q d i + pert q ) Decoding Problem: Given access to â q1,…, â q 2 n reconstruct d in time poly(n). encode 2 n subsets of [n] â q1 â q2 â q3 ’ The Adversary as a Decoding Algorithm

15 14 Where â q =  i  q d i mod 2 on 51% of the subsets The GL Algorithm finds in time poly(n) a small list of candidates, containing d d encode 2 n subsets of [n] n bits â q1 â q2 â q3 Goldreich-Levin Hardcore Bit Side remark

16 15 Comparing the Tasks RandomDependentQueries:  n d’ s.t. dist(d,d’) <  n (List decoding impossible) List decoding Decoding: Additive perturbation  fraction of the queries deviate from perturbation Corrupt ½-  of the queries Noise: a q =  i  q d i a q =  i  q d i (mod 2)Encoding: Side remark

17 16 Main Theorem: Given a DB response algorithm with perturbation E <  n, there is a poly-time reconstruction algorithm that outputs a database d’, s.t. dist(d,d’) < o(n). Recall Our Goal: Perturbation <<  n Implies no Privacy!

18 17 Proof of Main Theorem The Adversary Reconstruction Algorithm Observation: An LP solution always exists, e.g. x=d. Query phase: Get â q j for t random subsets q 1,…,q t of [n] Query phase: Get â q j for t random subsets q 1,…,q t of [n] Weeding phase: Solve the Linear Program: Weeding phase: Solve the Linear Program: 0  x i  1 |  i  q j x i - â q j |  E Rounding: Let c i = round(x i ), output c Rounding: Let c i = round(x i ), output c

19 18q Proof of Main Theorem Correctness of the Algorithm Consider x=(0.5,…,0.5) as a solution for the LP dx Observation: A random q often shows a  n advantage either to 0’s or to 1’s. - Such a q disqualifies x as a solution for the LP dist(x,d) >  n - We prove that if dist(x,d) >  n, then whp there will be a q among q 1,…,q t that disqualifies x

20 19 `Imperfect’ perturbation:`Imperfect’ perturbation: –Can approximate the original bit string even if database answer is within perturbation only for 99% of the queries Other information functions:Other information functions: –Given access to “noisy majority” of subsets we can approximate the original bit-string. Extensions of the Main Theorem

21 20 Notes on Impossibility Results Exponential Adversary:Exponential Adversary: –Strong breaking of privacy if E << n Polynomial Adversary:Polynomial Adversary: –Non-adaptive queries –Oblivious of perturbation method and database distribution –Tight threshold E  –Tight threshold E   n What if adversary is more restricted?What if adversary is more restricted?

22 21 Bounded Adversary Model Database: d  R {0,1} nDatabase: d  R {0,1} n : If the is bounded by T, then there is a DB response algorithm with perturbation of ~T that maintains privacy.Theorem: If the number of queries is bounded by T, then there is a DB response algorithm with perturbation of ~  T that maintains privacy. With a reasonable definition of privacy

23 22 Summary and Open Questions Very high perturbation is needed for privacyVery high perturbation is needed for privacy –Threshold phenomenon – above  n: total privacy, below  n: none (poly-time adversary) –Rules out many currently proposed solutions for SDB privacy –Q: what’s on the threshold? Usability? Main tool: A reconstruction algorithmMain tool: A reconstruction algorithm –Reconstructing an n-bit string from perturbed partial sums/thresholds Privacy for a T-bounded adversary with a random databasePrivacy for a T-bounded adversary with a random database –  T perturbation –Q: other database distributions Q: Crypto and SDB privacy?Q: Crypto and SDB privacy?

24 23 Our Privacy Definition (bounded adversary model) d -i i didi Fails w.p. > ½-  … (transcript, i)  R {0,1} n d  R {0,1} n d

25 24 d aq1aq1aq1aq1 aq2aq2aq2aq2 aqtaqtaqtaqt aq3aq3aq3aq3 âq1âq1âq1âq1 âq2âq2âq2âq2 âqtâqtâqtâqt âq3âq3âq3âq3 d’ encode pert decode partial sumsperturbed sums The Adversary as a Decoding Algorithm


Download ppt "Revealing Information while Preserving Privacy Kobbi Nissim NEC Labs, DIMACS Based on work with: Irit DinurCynthia Dwork and Joe Kilian Irit Dinur, Cynthia."

Similar presentations


Ads by Google