Page 1March 3, 2005 10th Estonian Winter School in Computer Science Privacy Preserving Data Mining Lecture 3 Non-Cryptographic Approaches for Preserving.

Slides:



Advertisements
Similar presentations
1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.
Advertisements

Ulams Game and Universal Communications Using Feedback Ofer Shayevitz June 2006.
Online Auditing Kobbi Nissim Microsoft Based on a position paper with Nina Mishra.
Foundations of Cryptography Lecture 10 Lecturer: Moni Naor.
Simulatability “The enemy knows the system”, Claude Shannon CompSci Instructor: Ashwin Machanavajjhala 1Lecture 6 : Fall 12.
CS4432: Database Systems II
Theory of Computing Lecture 3 MAS 714 Hartmut Klauck.
Bounds on Code Length Theorem: Let l ∗ 1, l ∗ 2,..., l ∗ m be optimal codeword lengths for a source distribution p and a D-ary alphabet, and let L ∗ be.
Online Auditing - How may Auditors Inadvertently Compromise Your Privacy Kobbi Nissim Microsoft With Nina Mishra HP/Stanford Work in progress.
CS345 Data Mining Page Rank Variants. Review Page Rank  Web graph encoded by matrix M N £ N matrix (N = number of web pages) M ij = 1/|O(j)| iff there.
Seminar in Foundations of Privacy 1.Adding Consistency to Differential Privacy 2.Attacks on Anonymized Social Networks Inbal Talgam March 2008.
Computational problems, algorithms, runtime, hardness
Differential Privacy 18739A: Foundations of Security and Privacy Anupam Datta Fall 2009.
Helping Kinsey Compute Cynthia Dwork Microsoft Research Cynthia Dwork Microsoft Research.
1 CSE 417: Algorithms and Computational Complexity Winter 2001 Lecture 21 Instructor: Paul Beame.
Revealing Information while Preserving Privacy Kobbi Nissim NEC Labs, DIMACS Based on work with: Irit DinurCynthia Dwork and Joe Kilian Irit Dinur, Cynthia.
Security in Databases. 2 Srini & Nandita (CSE2500)DB Security Outline review of databases reliability & integrity protection of sensitive data protection.
CHAPTER 4 Decidability Contents Decidable Languages
Anatomy: Simple and Effective Privacy Preservation Israel Chernyak DB Seminar (winter 2009)
Private Information Retrieval. What is Private Information retrieval (PIR) ? Reduction from Private Information Retrieval (PIR) to Smooth Codes Constructions.
Privacy without Noise Yitao Duan NetEase Youdao R&D Beijing China CIKM 2009.
Privacy-Preserving Datamining on Vertically Partitioned Databases Kobbi Nissim Microsoft, SVC Joint work with Cynthia Dwork.
Locally Decodable Codes Uri Nadav. Contents What is Locally Decodable Code (LDC) ? Constructions Lower Bounds Reduction from Private Information Retrieval.
Preference Analysis Joachim Giesen and Eva Schuberth May 24, 2006.
Foundations of Privacy Lecture 11 Lecturer: Moni Naor.
Session 6: Introduction to cryptanalysis part 1. Contents Problem definition Symmetric systems cryptanalysis Particularities of block ciphers cryptanalysis.
Security in Databases. 2 Outline review of databases reliability & integrity protection of sensitive data protection against inference multi-level security.
Strategic Behavior in Multi-Winner Elections A follow-up on previous work by Ariel Procaccia, Aviv Zohar and Jeffrey S. Rosenschein Reshef Meir The School.
Privacy Preserving Learning of Decision Trees Benny Pinkas HP Labs Joint work with Yehuda Lindell (done while at the Weizmann Institute)
Undirected ST-Connectivity In Log Space Omer Reingold Slides by Sharon Bruckner.
Foundations of Cryptography Lecture 9 Lecturer: Moni Naor.
Computational aspects of stability in weighted voting games Edith Elkind (NTU, Singapore) Based on joint work with Leslie Ann Goldberg, Paul W. Goldberg,
Foundations of Cryptography Lecture 8 Lecturer: Moni Naor.
Foundations of Cryptography Lecture 2 Lecturer: Moni Naor.
Auditing Batches of SQL Queries Rajeev Motwani Shubha Nabar Dilys Thomas Stanford University.
Computer Science Science of Computation Omer Reingold.
Statistical Databases – Query Auditing Li Xiong CS573 Data Privacy and Anonymity Partial slides credit: Vitaly Shmatikov, Univ Texas at Austin.
CS573 Data Privacy and Security Statistical Databases
Ragesh Jaiswal Indian Institute of Technology Delhi Threshold Direct Product Theorems: a survey.
Computational Complexity Theory Lecture 2: Reductions, NP-completeness, Cook-Levin theorem Indian Institute of Science.
Channel Capacity.
Disclosure risk when responding to queries with deterministic guarantees Krish Muralidhar University of Kentucky Rathindra Sarathy Oklahoma State University.
Yang Cai Oct 08, An overview of today’s class Basic LP Formulation for Multiple Bidders Succinct LP: Reduced Form of an Auction The Structure of.
The Selection Problem. 2 Median and Order Statistics In this section, we will study algorithms for finding the i th smallest element in a set of n elements.
Foundations of Cryptography Lecture 6 Lecturer: Moni Naor.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
Security Control Methods for Statistical Database Li Xiong CS573 Data Privacy and Security.
Additive Data Perturbation: the Basic Problem and Techniques.
Privacy-preserving rule mining. Outline  A brief introduction to association rule mining  Privacy preserving rule mining Single party  Perturbation.
Foundations of Privacy Lecture 5 Lecturer: Moni Naor.
1 Chapter 34: NP-Completeness. 2 About this Tutorial What is NP ? How to check if a problem is in NP ? Cook-Levin Theorem Showing one of the most difficult.
Differential Privacy Some contents are borrowed from Adam Smith’s slides.
Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.
CPS Computational problems, algorithms, runtime, hardness (a ridiculously brief introduction to theoretical computer science) Vincent Conitzer.
Slide 1 Vitaly Shmatikov CS 380S Privacy-Preserving Data Mining.
Differential Privacy Xintao Wu Oct 31, Sanitization approaches Input perturbation –Add noise to data –Generalize data Summary statistics –Means,
1 Differential Privacy Cynthia Dwork Mamadou H. Diallo.
Towards Robustness in Query Auditing Shubha U. Nabar Stanford University VLDB 2006 Joint Work With B. Marthi, K. Kenthapadi, N. Mishra, R. Motwani.
The Selection Algorithm : Design & Analysis [10].
Approximation Algorithms based on linear programming.
Sergey Yekhanin Institute for Advanced Study Lower Bounds on Noise.
University of Texas at El Paso
Privacy-Preserving Data Mining
Analysis of Algorithms
Privacy-preserving Release of Statistics: Differential Privacy
Differential Privacy in Practice
Objective of This Course
Chapter 11 Limitations of Algorithm Power
Switching Lemmas and Proof Complexity
CS639: Data Management for Data Science
Presentation transcript:

page 1March 3, th Estonian Winter School in Computer Science Privacy Preserving Data Mining Lecture 3 Non-Cryptographic Approaches for Preserving Privacy (Based on Slides of Kobbi Nissim) Benny Pinkas HP Labs, Israel

page 2March 3, th Estonian Winter School in Computer Science Why not use cryptographic methods? Many users contribute data. Cannot require them to participate in a cryptographic protocol. – In particular, cannot require p2p communication between users. Cryptographic protocols incur considerable overhead. d …

page 3March 3, th Estonian Winter School in Computer Science Data Privacy Data users breach privacy access mechanism d

page 4March 3, th Estonian Winter School in Computer Science Easy Tempting Solution But, ‘harmless’ attributes uniquely identify many patients (gender, age, approx weight, ethnicity, marital status…) Recall, DOB+gender+zip code identify people whp. Worse:`rare’ attributes (e.g. disease with prob.  1/3000) dd Mr. Brown Ms. John Mr. Doe A Bad Solution Idea: a. Remove identifying information (name, SSN, …) b. Publish data

page 5March 3, th Estonian Winter School in Computer Science What is Privacy? Something should not be computable from query answers – E.g.  Joe ={Joe’s private data} – The definition should take into account the adversary’s power (computational, #of queries, prior knowledge, …) Quite often it is much easier to say what is surely non- private – E.g. Strong breaking of privacy: adversary is able to retrieve (almost) everybody’s private data Intuition: privacy breached if it is possible to compute someone’s private information from his identity

page 6March 3, th Estonian Winter School in Computer Science The Data Privacy Game: an Information-Privacy Tradeoff Private functions: – want to hide  x (DB)=d x Information functions: – want to revealf (q, DB) for queries q Here: explicit definition of private functions. – The question: which information functions may be allowed? Different from Crypto (secure function evaluation): – There, want to reveal f() (explicit definition of information function) – want to hide all functions  () not computable from f() – Implicit definition of private functions – The question whether f() should be revealed is not asked xxxx f f f

page 7March 3, th Estonian Winter School in Computer Science A simplistic model: Statistical Database (SDB)  {0,1} n d  {0,1} n q  [n] query  a q =  i  q d i answer Mr. Fox 0/1 Ms. John 0/1 Mr. Doe 0/1 bits

page 8March 3, th Estonian Winter School in Computer Science Approaches to SDB Privacy Studied extensively since the 70s Perturbation – Add randomness. Give `noisy’ or `approximate’ answers – Techniques: Data perturbation (perturb data and then answer queries as usual) [Reiss 84, Liew Choi Liew 85, Traub Yemini Wozniakowski 84] … Output perturbation (perturb answers to queries) [Denning 80, Beck 80, Achugbue Chin 79, Fellegi Phillips 74] … – Recent interest: [Agrawal, Srikant 00] [Agrawal, Aggarwal 01],… Query Restriction – Answer queries accurately but sometimes disallow queries – Require queries to obey some structure [Dobkin Jones Lipton 79] Restricts number of queries – Auditing [Chin Ozsoyoglu 82, Kleinberg Papadimitriou Raghavan 01]

page 9March 3, th Estonian Winter School in Computer Science Some Recent Privacy Definitions X – data, Y – (noisy) observation of X [Agrawal, Srikant ‘00] Interval of confidence – Let Y = X+noise (e.g. uniform noise in [-100,100]. – Perturb input data. Can still estimate underlying distribution. – Tradeoff: more noise  less accuracy but more privacy. – Intuition: large possible interval  privacy preserved Given Y, we know that within c% confidence X is in [a 1,a 2 ]. For example, for Y=200, with 50% X is in [150,250]. a 2 -a 1 defines the amount of privacy at c% confidence – Problem: there might be some a-priori information about X X = someone’s age & Y= -97

page 10March 3, th Estonian Winter School in Computer Science The [AS] scheme can be turned against itself Assume that N is large – Even if the data-miner doesn’t have a-priori information about X, it can estimate it given the randomized data Y. The perturbation is uniform in [-1,1] [AS]: privacy interval 2 with confidence 100% Let f x (X)=50% for x  [0,1], and 50% for x  [4,5]. But, after learning f x (X) the value of X can be easily localized within an interval of size at most 1. – Problem: aggregate information provides information that can be used to attack individual data

page 11March 3, th Estonian Winter School in Computer Science Some Recent Privacy Definitions X – data, Y – (noisy) observation of X [Agrawal, Aggarwal ‘01] Mutual information – Intuition: High entropy is good. I(X;Y) = H(X)-H(X|Y) (mutual information) small I(X;Y) (mutual information)  privacy preserved (Y provides little information about X). Problem [EGS] : – Average notion. Privacy loss can happen with low but significant probability, but without affecting I(X;Y). – Sometimes I(X;Y) seems good but privacy is breached

page 12March 3, th Estonian Winter School in Computer Science Output Perturbation (Randomization Approach) Exact answer to query q: – a q =  i  q d i Actual SDB answer: â q Perturbation  : – For all q: | â q – a q | ≤  Questions: – Does perturbation give any privacy? – How much perturbation is needed for privacy? – Usability

page 13March 3, th Estonian Winter School in Computer Science Privacy Preserved by Perturbation    n Database: d  R {0,1} n (uniform input distribution!) Algorithm: on query q, 1. Let a q =  i  q d i 2. If | a q - |q|/2 | <  return â q = |q| / 2 3. Otherwise return â q = a q    n (lgn) 2  Privacy is preserved – Assume poly(n) queries – If    n (lgn) 2, whp always use rule 2 No information about d is given! (but database is completely useless…) Shows that sometimes perturbation   n is enough for privacy. Can we do better? q/2 aqaq âqâq

page 14March 3, th Estonian Winter School in Computer Science strong breaking of privacy The previous useless database achieves the best possible perturbation. Theorem [Dinur-Nissim]: Given any DB and any DB response algorithm with perturbation  = o(  n), there is a poly-time reconstruction algorithm that outputs a database d’, s.t. dist(d,d’) < o(n). Perturbation  <<  n Implies no Privacy

page 15March 3, th Estonian Winter School in Computer Science dd encode pert decode aq1aq1aq1aq1 aq2aq2aq2aq2 aqtaqtaqtaqt aq3aq3aq3aq3 partial sums âq1âq1âq1âq1 âq2âq2âq2âq2 âqtâqtâqtâqt âq3âq3âq3âq3 perturbed sums The Adversary as a Decoding Algorithm ’

page 16March 3, th Estonian Winter School in Computer Science Proof of Theorem [DN03] The Adversary Reconstruction Algorithm Observation: A solution always exists, e.g. x=d. Query phase: Get â q j for t random subsets q 1,…,q tQuery phase: Get â q j for t random subsets q 1,…,q t Weeding phase: Solve the Linear Program (over  ):Weeding phase: Solve the Linear Program (over  ): 0  x i  1 |  i  qj x i - â qj |  |  i  qj x i - â qj |   Rounding: Let c i = round(x i ), output cRounding: Let c i = round(x i ), output c

page 17March 3, th Estonian Winter School in Computer Science Why does the Reconstruction Algorithm Work? Consider x  {0,1} n s.t. dist(x,d)=c·n =  (n) Observation: – A random q contains c’·n coordinates in which x≠d – The differences in the sum of these coordinates is, with constant probability, at least  (  n) (>  = o(  n) ). – Such a q disqualifies x as a solution for the LP Since the total number of queries q is polynomial, then all such vectors x are disqualified with overwhelming probability.

page 18March 3, th Estonian Winter School in Computer Science Summary of Results (statistical database) [Dinur, Nissim 03] : – Unlimited adversary: Perturbation of magnitude  (n) required – Polynomial-time adversary: Perturbation of magnitude  (sqrt(n)) is required (shown above) – In both cases, adversary may reconstruct a good approximation for the database Disallows even very weak notions of privacy – Bounded adversary, restricted to T << n queries (SuLQ): There is a privacy preserving access mechanism with perturbation << sqrt(T) Chance for usability Reasonable model as database grows larger and larger small DB medium DB large DB

page 19March 3, th Estonian Winter School in Computer Science SuLQ for Multi-Attribute Statistical Database (SDB) Query (q, f) q  [n]q  [n]q  [n]q  [n] f : {0,1} k  {0,1} Answer a q,f =  i  q f(d i ) n persons k attributes Database {d i,j } f f f f  a q,f Row distribution D (D 1,D 2,…,D n )

page 20March 3, th Estonian Winter School in Computer Science Privacy and Usability Concerns for the Multi-Attribute Model [DN] Rich set of queries: subset sums over any property of the k attributes – Obviously increases usability, but how is privacy affected? More to protect: functions of the k attributes Relevant factors: – What is the adversary’s goal? – Row dependency Vertically split data (between k or less databases): – Can privacy still be maintained with independently operating databases?

page 21March 3, th Estonian Winter School in Computer Science Privacy Definition - Intuition 3-phase adversary – Phase 0: defines a target set G of poly(n) functions g: {0,1} k  {0,1} Will try to learn some of this information about someone – Phase 1: adaptively queries the database T=o(n) times – Phase 2: chooses an index i of a row it intends to attack and a function g  G Attack: – given d -i –try to guess g(d i,1 …d i,k ) use all gained info to choose i, g

page 22March 3, th Estonian Winter School in Computer Science The Privacy Definition P 0 i,g – a-priori probability that g(d i,1 …d i,k )=1 p T i,g – a-posteriori probability that g(d i,1 …d i,k )=1 – Given answers to the T queries, and d -i Define conf(p) = log (p/(1-p)) – 1-1 relationship between p and conf(p) – conf(1/2)=0; conf(2/3)=1; conf(1)=   conf i,g = conf(p T i,g ) – conf(p 0 i,g ) ( ,T) – privacy: (“relative privacy”) – For all distributions D 1 …D n, row i, function g and any adversary making at most T queries: Pr[  conf i,g >  ] = neg(n)

page 23March 3, th Estonian Winter School in Computer Science The SuLQ* Database Adversary restricted to T << n queries On query (q, f): q  [n] f : {0,1} k  {0,1} (binary function) – Let a q,f =  i  q f(d i,1 …d i,k ) – Let N  Binomial(0,  T ) – Return a q,f +N *SuLQ – Sub Linear Queries

page 24March 3, th Estonian Winter School in Computer Science Privacy Analysis of the SuLQ Database P m i,g - a-posteriori probability that g(d i,1 …d i,k )=1 – Given d -i and answers to the first m queries conf(p m i,g ) Describes a random walk on the line with: – Starting point: conf(p 0 i,g ) – Compromise: conf(p m i,g ) – conf(p 0 i,g ) >  W.h.p. more than T steps needed to reach compromise conf(p 0 i,g ) conf(p 0 i,g ) + 

page 25March 3, th Estonian Winter School in Computer Science Usability: One multi-attribute SuLQ DB Statistics of any property f of the k attributes – I.e. for what fraction of the (sub)population does f(d 1 …d k ) hold? – Easy: just put f in the query – Other applications: k independent multi-attribute SuLQ DBs Vertically partitioned SulQ DBs Testing whether Pr[  |  ] ≥ Pr[  ]+  – Caveat: we hide g() about a specific row (not about multiple rows)

page 26March 3, th Estonian Winter School in Computer Science Overview of Methods Input Perturbation Output Perturbation Query Restriction SDB User (Restricted) Query Exact Response Or Denial SDB User SDB’ Data Perturbation Query Response SDB User (Restricted) Query Perturbed Response

page 27March 3, th Estonian Winter School in Computer Science Query restriction The decision whether to answer or deny the query – Can be based on the content of the query and on answers to previous queries – Or, can be based on the above and on the content of the database SDB User (Restricted) Query Exact Response Or Denial

page 28March 3, th Estonian Winter School in Computer Science Auditing [AW89] classify auditing as a query restriction method: – “Auditing of an SDB involves keeping up-to-date logs of all queries made by each user (not the data involved) and constantly checking for possible compromise whenever a new query is issued” Partial motivation: May allow for more queries to be posed, if no privacy threat occurs. Early work: Hofmann 1977, Schlorer 1976, Chin, Ozsoyoglu 1981, 1986 Recent interest: Kleinberg, Papadimitriou, Raghavan 2000, Li, Wang, Wang, Jajodia 2002, Jonsson, Krokhin 2003

page 29March 3, th Estonian Winter School in Computer Science How Auditors may Inadvertently Compromise Privacy

page 30March 3, th Estonian Winter School in Computer Science The Setting Dataset: d={d 1,…,d n } – Entries d i : Real, Integer, Boolean Query: q = (f,i 1,…,i k ) – f : Min, Max, Median, Sum, Average, Count… Bad users will try to breach the privacy of individuals Compromise  uniquely determine d i (very weak def) Statistical database f (d i1,…,d ik ) q = (f,i 1,…,i k )

page 31March 3, th Estonian Winter School in Computer Science Auditing Statistical database Query log q 1,…,q i Here’s a new query: q i+1 Here’s the answer Query denied (as the answer would cause privacy loss) OR Auditor

page 32March 3, th Estonian Winter School in Computer Science Example 1: Sum/Max auditing Oh well… q1 = sum(d1,d2,d3) sum(d1,d2,d3) = 15 q2 = max(d1,d2,d3) Denied (the answer would cause privacy loss) q2 is denied iff d1=d2=d3 = 5 I win! Auditor d i real, sum/max queries, privacy breached if some d i learned There must be a reason for the denial…

page 33March 3, th Estonian Winter School in Computer Science Sounds Familiar? Mr. Chairman, I would like to answer the committee's questions, but on the advice of my counsel I respectfully decline to answer the question based on the protection afforded me under the Constitution of the United States. David Duncan, Former auditor for Enron and partner in Andersen:

page 34March 3, th Estonian Winter School in Computer Science Max Auditing q1 = max(d1,d2,d3,d4) M 1234 d i real M 123 / denied If denied: d4=M 1234 M 12 / denied If denied: d3=M 123 Auditor q2 = max(d1,d2,d3) q2 = max(d1,d2) d1d2d4d6d3d5d7d8…dndn d n-1 Learn an item with prob ½

page 35March 3, th Estonian Winter School in Computer Science Boolean Auditing? 1 / denied q i denied iff d i = d i+1  learn database/complement Auditor … d i Boolean d1d2d4d6d3d5d7d8…dndn d n-1 q1 = sum(d1,d2) q2=sum(d2,d3)

page 36March 3, th Estonian Winter School in Computer Science The Problem The problem: – Query denials leak (potentially sensitive) information Users cannot decide denials by themselves Possible assignments to {d 1,…,d n } Assignments consistent with (q 1,…q i, a 1,…,a i ) q i+1 denied

page 37March 3, th Estonian Winter School in Computer Science Solution to the problem: simulatable Auditing An auditor is simulatable if a simulator exists s.t.: Auditor q i+1  Deny/answer Simulator Simulation  denials do not leak information q 1,…,q i a 1,…,a i Statistical database q 1,…,q i

page 38March 3, th Estonian Winter School in Computer Science Why Simulatable Auditors do not Leak Information? Possible assignments to {d 1,…,d n } Assignments consistent with (q 1,…q i, a 1,…,a i ) q i+1 denied/allowed

page 39March 3, th Estonian Winter School in Computer Science Simulatable auditing

page 40March 3, th Estonian Winter School in Computer Science Query Restriction for Sum Queries Given: – D={x 1,..,x n } dataset, x i  – S is a subset of X. Query:  xi  S x i Is it possible to compromise D? – Here compromise means: uniquely determine x i from the queries Can compromise if subsets arbitrarily small: – sum(x 9 )= x 9

page 41March 3, th Estonian Winter School in Computer Science Query Set Size Control Do not permit queries that involve a small subset of the database. Compromise still possible – Want to discover x: sum(x,y 1,..,y k ) - sum(y 1,..,y k ) = x Issue: Overlap In general, overlap is not enough. – Need to restrict also the number of queries – Note that overlap itself sometimes restricts number of queries (e.g. size of queries = cn, overlap = const, only about 1/c possible queries)

page 42March 3, th Estonian Winter School in Computer Science Restricting Set-Sum Queries Restricting the sum queries based on – Number of database elements in the sum – Overlap with previous sum queries – Total number of queries Note that the criteria are known to the user – They do not depend on the contents of the database Therefore, the user can simulate the denial/no-denial answer given by the DB – Simulatable auditing

page 43March 3, th Estonian Winter School in Computer Science Restricting Overlap and Number of Queries Assume: – |Query Q i | ≥ k – |Q i  Q j | ≤ r – Adversary knows a-priori at most L values, L+1 < k Claim: Data cannot be compromised with fewer than 1+(2k-L)/r Sum Queries x1 x2 x3.. xn = Q1 Q2 Q3... Qt ≥ k ≤r ≥ k xlxl ≤r

page 44March 3, th Estonian Winter School in Computer Science Claim: Data cannot be compromised with fewer than 1+(2k-L)/r Sum Queries [Dobkin,Jones,Lipton][Reiss] – k overlap, L  a-priori known items Suppose x c compromised after t queries where each query represented by: – Q i = x i1 + x i2 + … + x ik for i =1, …, t Implies that: – x c =  i=1,t  i Q i =  i=1,t  i  j=1,k x ij – Let  i  = 1 if x  in query i, 0 otherwise – x c =  i=1,t  i   =1,n  i  x  =   =1,n (  i=1,t  i  i  )x  Overlap + Number of Queries

page 45March 3, th Estonian Winter School in Computer Science We have: x c =   =1,n (  i=1,t  i  i  )x  In the above sum, (  i=1,t  i  i  ) must be 0 for all x  except for x c (in order for x c to be compromised) This happens iff  i  =0 for all i, or if  i  =  j  =1 and  i  j have opposite signs – or  i =0, in which case the ith query didn’t matter Overlap + Number of Queries

page 46March 3, th Estonian Winter School in Computer Science Wlog, first query contains x c, second query is of opposite sign. In the first query, k elements are probed The second query adds at least k-r elements Elements from first and second queries cannot be canceled within the same (additional) query (opposite signs requires) Therefore each new query cancels items from first or from second query, but not from both. Need to cancel 2k-r-L elements. – Need 2+(2k-r-L)/r queries, i.e. 1+(2k-L)/r. Overlap + Number of Queries

page 47March 3, th Estonian Winter School in Computer Science Notes The number of queries satisfying |Q i |≥ k and |Q i  Q j | ≤r is small – If k=n/c for some constant c and r=const, then there are only ~c queries where no two queries overlap by more than 1. – Hence, query sequence length may be uncomfortably short. – Or, if r=k/c (overlap is a constant fraction of query size) then number of queries, 1+(2k-L)/r, is O( c).

page 48March 3, th Estonian Winter School in Computer Science Conclusions Privacy should be defined and analyzed rigorously – In particular, assuming randomization  privacy is dangerous High perturbation is needed for privacy against polynomial adversaries – Threshold phenomenon – above  n: total privacy, below  n: no privacy (for poly-time adversary) – Main tool: a reconstruction algorithm Careless auditing might leak private information Self Auditing (simulatable auditors) is safe – Decision whether to allow a query based on previous `good’ queries and their answers Without access to DB contents Users may apply the decision procedure by themselves

page 49March 3, th Estonian Winter School in Computer Science ToDo Come up with good model and requirements for database privacy – Learn from crypto – Protect against more general loss of privacy Simulatable auditors are a starting point for designing more reasonable audit mechanisms

page 50March 3, th Estonian Winter School in Computer Science References Course web page: – A Study of Perturbation Techniques for Data Privacy, Cynthia Dwork and Nina Mishra and Kobbi Nissim, – Privacy and Databases

page 51March 3, th Estonian Winter School in Computer Science Foundations of CS at the Weizmann Institute Uri Feige Oded Goldreich Shafi Goldwasser David Harel Moni Naor David Peleg Amir Pnueli Ran Raz Omer Reingold Adi Shamir All students receive a fellowship Language of instruction English Yellow  crypto