Revealing Information while Preserving Privacy Kobbi Nissim NEC Labs, DIMACS Based on work with: Irit DinurCynthia Dwork and Joe Kilian Irit Dinur, Cynthia.

Slides:



Advertisements
Similar presentations
1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.
Advertisements

Pretty-Good Tomography Scott Aaronson MIT. Theres a problem… To do tomography on an entangled state of n qubits, we need exp(n) measurements Does this.
Efficient Private Approximation Protocols Piotr Indyk David Woodruff Work in progress.
Restriction Access, Population Recovery & Partial Identification Avi Wigderson IAS, Princeton Joint with Zeev Dvir Anup Rao Amir Yehudayoff.
Foundations of Cryptography Lecture 10 Lecturer: Moni Naor.
Foundations of Privacy Lecture 6 Lecturer: Moni Naor.
Statistical database security Special purpose: used only for statistical computations. General purpose: used with normal queries (and updates) as well.
Seminar in Foundations of Privacy 1.Adding Consistency to Differential Privacy 2.Attacks on Anonymized Social Networks Inbal Talgam March 2008.
Kunal Talwar MSR SVC [Dwork, McSherry, Talwar, STOC 2007] TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A AA A.
Helping Kinsey Compute Cynthia Dwork Microsoft Research Cynthia Dwork Microsoft Research.
An architecture for Privacy Preserving Mining of Client Information Jaideep Vaidya Purdue University This is joint work with Murat.
The 1’st annual (?) workshop. 2 Communication under Channel Uncertainty: Oblivious channels Michael Langberg California Institute of Technology.
The Goldreich-Levin Theorem: List-decoding the Hadamard code
Anatomy: Simple and Effective Privacy Preservation Israel Chernyak DB Seminar (winter 2009)
Private Information Retrieval. What is Private Information retrieval (PIR) ? Reduction from Private Information Retrieval (PIR) to Smooth Codes Constructions.
Privacy-Preserving Datamining on Vertically Partitioned Databases Kobbi Nissim Microsoft, SVC Joint work with Cynthia Dwork.
Locally Decodable Codes Uri Nadav. Contents What is Locally Decodable Code (LDC) ? Constructions Lower Bounds Reduction from Private Information Retrieval.
Foundations of Privacy Lecture 11 Lecturer: Moni Naor.
ON THE PROVABLE SECURITY OF HOMOMORPHIC ENCRYPTION Andrej Bogdanov Chinese University of Hong Kong Bertinoro Summer School | July 2014 based on joint work.
Privacy Preserving Learning of Decision Trees Benny Pinkas HP Labs Joint work with Yehuda Lindell (done while at the Weizmann Institute)
Foundations of Cryptography Lecture 9 Lecturer: Moni Naor.
Computational aspects of stability in weighted voting games Edith Elkind (NTU, Singapore) Based on joint work with Leslie Ann Goldberg, Paul W. Goldberg,
How Robust are Linear Sketches to Adaptive Inputs? Moritz Hardt, David P. Woodruff IBM Research Almaden.
Multiplicative Weights Algorithms CompSci Instructor: Ashwin Machanavajjhala 1Lecture 13 : Fall 12.
1 The Santa Claus Problem (Maximizing the minimum load on unrelated machines) Nikhil Bansal (IBM) Maxim Sviridenko (IBM)
Ragesh Jaiswal Indian Institute of Technology Delhi Threshold Direct Product Theorems: a survey.
Learning Parities with Structured Noise Sanjeev Arora, Rong Ge Princeton University.
Page 1March 3, th Estonian Winter School in Computer Science Privacy Preserving Data Mining Lecture 3 Non-Cryptographic Approaches for Preserving.
Tools for Privacy Preserving Distributed Data Mining
Privacy by Learning the Database Moritz Hardt DIMACS, October 24, 2012.
Disclosure risk when responding to queries with deterministic guarantees Krish Muralidhar University of Kentucky Rathindra Sarathy Oklahoma State University.
Yang Cai Oct 08, An overview of today’s class Basic LP Formulation for Multiple Bidders Succinct LP: Reduced Form of an Auction The Structure of.
Private Approximation of Search Problems Amos Beimel Paz Carmi Kobbi Nissim Enav Weinreb (Technion)
The Sparse Vector Technique CompSci Instructor: Ashwin Machanavajjhala 1Lecture 12 : Fall 12.
Personalized Social Recommendations – Accurate or Private? A. Machanavajjhala (Yahoo!), with A. Korolova (Stanford), A. Das Sarma (Google) 1.
Security Control Methods for Statistical Database Li Xiong CS573 Data Privacy and Security.
On the Cost of Reconstructing a Secret, or VSS with Optimal Reconstruction Phase Ronald Cramer, Ivan Damgard, Serge Fehr.
Privacy-preserving rule mining. Outline  A brief introduction to association rule mining  Privacy preserving rule mining Single party  Perturbation.
Communication vs. Computation S Venkatesh Univ. Victoria Presentation by Piotr Indyk (MIT) Kobbi Nissim Microsoft SVC Prahladh Harsha MIT Joe Kilian NEC.
Boosting and Differential Privacy Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A.
Foundations of Privacy Lecture 5 Lecturer: Moni Naor.
1 WP 10 On Risk Definitions and a Neighbourhood Regression Model for Sample Disclosure Risk Estimation Natalie Shlomo Hebrew University Southampton University.
Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.
1 What Can We Learn Privately? Sofya Raskhodnikova Penn State University Joint work with Shiva Kasiviswanathan Los Alamos Homin Lee UT Austin Kobbi Nissim.
Differential Privacy Xintao Wu Oct 31, Sanitization approaches Input perturbation –Add noise to data –Generalize data Summary statistics –Means,
Private Information Retrieval Based on the talk by Yuval Ishai, Eyal Kushilevitz, Tal Malkin.
When is Key Derivation from Noisy Sources Possible?
1 Differential Privacy Cynthia Dwork Mamadou H. Diallo.
Towards Robustness in Query Auditing Shubha U. Nabar Stanford University VLDB 2006 Joint Work With B. Marthi, K. Kenthapadi, N. Mishra, R. Motwani.
Cryptography Lecture 10 Arpita Patra © Arpita Patra.
University of Texas at El Paso
Private Data Management with Verification
Dana Ron Tel Aviv University
Vitaly Feldman and Jan Vondrâk IBM Research - Almaden
Introduction to Machine Learning
Understanding Generalization in Adaptive Data Analysis
Privacy-preserving Release of Statistics: Differential Privacy
Background: Lattices and the Learning-with-Errors problem
Direct product testing
Topic 5: Constructing Secure Encryption Schemes
Differential Privacy in Practice
General Strong Polarization
Nikhil Bansal, Shashwat Garg, Jesper Nederlof, Nikhil Vyas
The Curve Merger (Dvir & Widgerson, 2008)
Cryptography Lecture 12 Arpita Patra © Arpita Patra.
General Strong Polarization
Every set in P is strongly testable under a suitable encoding
Published in: IEEE Transactions on Industrial Informatics
CS639: Data Management for Data Science
General Strong Polarization
Presentation transcript:

Revealing Information while Preserving Privacy Kobbi Nissim NEC Labs, DIMACS Based on work with: Irit DinurCynthia Dwork and Joe Kilian Irit Dinur, Cynthia Dwork and Joe Kilian

2 The Hospital Story Patient data q?q? a Medical DB

3 Easy Tempting Solution Observation: ‘harmless’ attributes uniquely identify many patients (gender, approx age, approx weight, ethnicity, marital status…)Observation: ‘harmless’ attributes uniquely identify many patients (gender, approx age, approx weight, ethnicity, marital status…) Worse:`rare’ attribute (CF  1/3000)Worse:`rare’ attribute (CF  1/3000) dd Mr. Smith Ms. John Mr. Doe A Bad Solution Idea: a. Remove identifying information (name, SSN, …) b. Publish data

4 Our Model: Statistical Database (SDB)  {0,1} n d  {0,1} n q  [n]  a q =  i  q d i Mr. Smith Ms. John Mr. Doe

5 The Privacy Game: Information-Privacy Tradeoff Private functions:Private functions: –want to hide  i (d 1, …,d n )=d i Information functions:Information functions: –want to revealf q (d 1, …,d n )=  i  q d i Explicit definition of private functionsExplicit definition of private functions Crypto: secure function evaluationCrypto: secure function evaluation –want to reveal f() –want to hide all functions  () not computable from f() –Implicit definition of private functions

6 Approaches to SDB Privacy [AW 89] Query RestrictionQuery Restriction –Require queries to obey some structure PerturbationPerturbation –Give `noisy’ or `approximate’ answers This talk

7 Perturbation Database: d = d 1,…,d n Database: d = d 1,…,d n Query: q  [n] Query: q  [n] Exact answer: a q =  i  q d i Exact answer: a q =  i  q d i Perturbed answer: â q Perturbed answer: â q Perturbation E: For all q: | â q – a q | ≤ E General Perturbation: Pr q [|â q – a q | ≤ E] = 1-neg(n) = 99%, 51% = 99%, 51%

8 Data perturbation: –Swapping [Reiss 84][Liew, Choi, Liew 85] –Fixed perturbations [Traub, Yemini, Wozniakowski 84] [Agrawal, Srikant 00] [Agrawal, Aggarwal 01] Additive perturbation d’ i =d i +E iAdditive perturbation d’ i =d i +E i Output perturbation: –Random sample queries [Denning 80] Sample drawn from query setSample drawn from query set –Varying perturbations [Beck 80] Perturbation variance grows with number of queriesPerturbation variance grows with number of queries –Rounding [Achugbue, Chin 79] Randomized [Fellegi, Phillips 74] … Perturbation Techniques [AW89]

9 Main Question: How much perturbation is needed to achieve privacy?

10 Privacy from  Perturbation Privacy from   n Perturbation Database: d  R {0,1} n Database: d  R {0,1} n On query q: On query q: 1. Let a q =  i  q d i 2. If |a q -|q|/2| > E return â q = a q 3. Otherwise return â q = |q|/2 Privacy is preserved Privacy is preserved, whp always use rule 3 – If E   n (lgn) 2, whp always use rule 3 No information about d is given! No information about d is given! No usability! Can we do better? Smaller E ? Usability ??? (an example of a useless database)

11 Defining Privacy Elusive definitionElusive definition –Application dependent –Partial vs. exact compromise –Prior knowledge, how to model it? –Other issues … Defining Privacy (not) Defining Privacy Instead of defining privacy: What is surely non-private…Instead of defining privacy: What is surely non-private… –Strong breaking of privacy

12 Strong Breaking of Privacy The Useless Database Achieves Best Possible Perturbation: Perturbation <<  n Implies no Privacy! Main Theorem: Given a DB response algorithm with perturbation E <<  n, there is a poly- time reconstruction algorithm that outputs a database d’, s.t. dist(d,d’) < o(n).

13 d n bits (Recall â q =  i  q d i + pert q ) Decoding Problem: Given access to â q1,…, â q 2 n reconstruct d in time poly(n). encode 2 n subsets of [n] â q1 â q2 â q3 ’ The Adversary as a Decoding Algorithm

14 Where â q =  i  q d i mod 2 on 51% of the subsets The GL Algorithm finds in time poly(n) a small list of candidates, containing d d encode 2 n subsets of [n] n bits â q1 â q2 â q3 Goldreich-Levin Hardcore Bit Side remark

15 Comparing the Tasks RandomDependentQueries:  n d’ s.t. dist(d,d’) <  n (List decoding impossible) List decoding Decoding: Additive perturbation  fraction of the queries deviate from perturbation Corrupt ½-  of the queries Noise: a q =  i  q d i a q =  i  q d i (mod 2)Encoding: Side remark

16 Main Theorem: Given a DB response algorithm with perturbation E <  n, there is a poly-time reconstruction algorithm that outputs a database d’, s.t. dist(d,d’) < o(n). Recall Our Goal: Perturbation <<  n Implies no Privacy!

17 Proof of Main Theorem The Adversary Reconstruction Algorithm Observation: An LP solution always exists, e.g. x=d. Query phase: Get â q j for t random subsets q 1,…,q t of [n] Query phase: Get â q j for t random subsets q 1,…,q t of [n] Weeding phase: Solve the Linear Program: Weeding phase: Solve the Linear Program: 0  x i  1 |  i  q j x i - â q j |  E Rounding: Let c i = round(x i ), output c Rounding: Let c i = round(x i ), output c

18q Proof of Main Theorem Correctness of the Algorithm Consider x=(0.5,…,0.5) as a solution for the LP dx Observation: A random q often shows a  n advantage either to 0’s or to 1’s. - Such a q disqualifies x as a solution for the LP dist(x,d) >  n - We prove that if dist(x,d) >  n, then whp there will be a q among q 1,…,q t that disqualifies x

19 `Imperfect’ perturbation:`Imperfect’ perturbation: –Can approximate the original bit string even if database answer is within perturbation only for 99% of the queries Other information functions:Other information functions: –Given access to “noisy majority” of subsets we can approximate the original bit-string. Extensions of the Main Theorem

20 Notes on Impossibility Results Exponential Adversary:Exponential Adversary: –Strong breaking of privacy if E << n Polynomial Adversary:Polynomial Adversary: –Non-adaptive queries –Oblivious of perturbation method and database distribution –Tight threshold E  –Tight threshold E   n What if adversary is more restricted?What if adversary is more restricted?

21 Bounded Adversary Model Database: d  R {0,1} nDatabase: d  R {0,1} n : If the is bounded by T, then there is a DB response algorithm with perturbation of ~T that maintains privacy.Theorem: If the number of queries is bounded by T, then there is a DB response algorithm with perturbation of ~  T that maintains privacy. With a reasonable definition of privacy

22 Summary and Open Questions Very high perturbation is needed for privacyVery high perturbation is needed for privacy –Threshold phenomenon – above  n: total privacy, below  n: none (poly-time adversary) –Rules out many currently proposed solutions for SDB privacy –Q: what’s on the threshold? Usability? Main tool: A reconstruction algorithmMain tool: A reconstruction algorithm –Reconstructing an n-bit string from perturbed partial sums/thresholds Privacy for a T-bounded adversary with a random databasePrivacy for a T-bounded adversary with a random database –  T perturbation –Q: other database distributions Q: Crypto and SDB privacy?Q: Crypto and SDB privacy?

23 Our Privacy Definition (bounded adversary model) d -i i didi Fails w.p. > ½-  … (transcript, i)  R {0,1} n d  R {0,1} n d

24 d aq1aq1aq1aq1 aq2aq2aq2aq2 aqtaqtaqtaqt aq3aq3aq3aq3 âq1âq1âq1âq1 âq2âq2âq2âq2 âqtâqtâqtâqt âq3âq3âq3âq3 d’ encode pert decode partial sumsperturbed sums The Adversary as a Decoding Algorithm