Presentation is loading. Please wait.

Presentation is loading. Please wait.

Simulatability “The enemy knows the system”, Claude Shannon CompSci 590.03 Instructor: Ashwin Machanavajjhala 1Lecture 6 : 590.03 Fall 12.

Similar presentations


Presentation on theme: "Simulatability “The enemy knows the system”, Claude Shannon CompSci 590.03 Instructor: Ashwin Machanavajjhala 1Lecture 6 : 590.03 Fall 12."— Presentation transcript:

1 Simulatability “The enemy knows the system”, Claude Shannon CompSci 590.03 Instructor: Ashwin Machanavajjhala 1Lecture 6 : 590.03 Fall 12

2 Announcements Please meet with me at least 2 times before you finalize your project (deadline Sep 28). Lecture 6 : 590.03 Fall 122

3 Recap – L-Diversity The link between identity and attribute value is the sensitive information. “Does Bob have Cancer? Heart disease? Flu?” “Does Umeko have Cancer? Heart disease? Flu?” Adversary knows ≤ L-2 negation statements. “Umeko does not have Heart Disease.” – Data Publisher may not know exact adversarial knowledge Privacy is breached when identity can be linked to attribute value with high probability Pr[ “Bob has Cancer” | published table, adv. knowledge] > t 3Lecture 6 : 590.03 Fall 12

4 ZipAgeNat. Disease 1306*<=40*Heart 1306*<=40*Flu 1306*<=40*Cancer 1306*<=40*Cancer 1485*>40*Cancer 1485*>40*Heart 1485*>40*Flu 1485*>40*Flu 1305*<=40*Heart 1305*<=40*Flu 1305*<=40*Cancer 1305*<=40*Cancer Recap – 3-Diverse Table 4 L-Diversity Principle: Every group of tuples with the same Q-ID values has ≥ L distinct sensitive values of roughly equal proportions. Lecture 6 : 590.03 Fall 12

5 Outline Simulatable Auditing Minimality Attack in anonymization Simulatable algorithms for anoymization Lecture 6 : 590.03 Fall 125

6 Query Auditing Database has numeric values (say salaries of employees). Database either truthfully answers a question or denies answering. MIN, MAX, SUM queries over subsets of the database. Question: When to allow/deny queries? Database Researcher Query Safe to publish? Yes No 6Lecture 6 : 590.03 Fall 12

7 Why should we deny queries? Q1: Ben’s sensitive value? – DENY Q2: Max sensitive value of males? – ANSWER: 2 Q3: Max sensitive value of 1 st year PhD students? – ANSWER: 3 But Q3 + Q2 => Xi = 3 Lecture 6 : 590.03 Fall 127 Name1 st year PhD GenderSensitiv e value BenYM1 BhaNM1 IosYM1 JanNM2 JianYM2 JieNM1 JoeNM2 MohNM1 SonNF1 XiYF3 YaoNM2

8 Value-Based Auditing Let a 1, a 2, …, a k be the answers to previous queries Q 1, Q 2, …, Q k. Let a k+1 be the answer to Q k+1. a i = f(c i1 x 1, c i2 x 2, …, c in x n ), i = 1 … k+1 c im = 1 if Q i depends on x m Check if any x j has a unique solution. 8Lecture 6 : 590.03 Fall 12

9 Value-based Auditing Data Values: {x 1, x 2, x 3, x 4, x 5 }, Queries: MAX. Allow query if value of xi can’t be inferred. x1x2x3x4x5x1x2x3x4x5 9Lecture 6 : 590.03 Fall 12

10 Value-based Auditing Data Values: {x 1, x 2, x 3, x 4, x 5 }, Queries: MAX. Allow query if value of xi can’t be inferred. x1x2x3x4x5x1x2x3x4x5 max(x 1, x 2, x 3, x 4, x 5 ) Ans: 10 10 -∞ ≤ x 1 … x 5 ≤ 10 10Lecture 6 : 590.03 Fall 12

11 Value-based Auditing Data Values: {x 1, x 2, x 3, x 4, x 5 }, Queries: MAX. Allow query if value of xi can’t be inferred. x1x2x3x4x5x1x2x3x4x5 max(x 1, x 2, x 3, x 4, x 5 ) Ans: 10 10 max(x 1, x 2, x 3, x 4 ) Ans: 8 DENY -∞ ≤ x 1 … x 4 ≤ 8 => x 5 = 10 11Lecture 6 : 590.03 Fall 12

12 Value-based Auditing Data Values: {x 1, x 2, x 3, x 4, x 5 }, Queries: MAX. Allow query if value of xi can’t be inferred. x1x2x3x4x5x1x2x3x4x5 max(x 1, x 2, x 3, x 4, x 5 ) Ans: 10 10 max(x 1, x 2, x 3, x 4 ) Ans: 8 DENY Denial means some value can be compromised! 12Lecture 6 : 590.03 Fall 12

13 Value-based Auditing Data Values: {x 1, x 2, x 3, x 4, x 5 }, Queries: MAX. Allow query if value of xi can’t be inferred. x1x2x3x4x5x1x2x3x4x5 max(x 1, x 2, x 3, x 4, x 5 ) Ans: 10 10 max(x 1, x 2, x 3, x 4 ) Ans: 8 DENY What could max(x1, x2, x3, x4) be? 13Lecture 6 : 590.03 Fall 12

14 Value-based Auditing Data Values: {x 1, x 2, x 3, x 4, x 5 }, Queries: MAX. Allow query if value of xi can’t be inferred. x1x2x3x4x5x1x2x3x4x5 max(x 1, x 2, x 3, x 4, x 5 ) Ans: 10 10 max(x 1, x 2, x 3, x 4 ) Ans: 8 DENY From first answer, max(x1,x2,x3,x4) ≤ 10 14Lecture 6 : 590.03 Fall 12

15 Value-based Auditing Data Values: {x 1, x 2, x 3, x 4, x 5 }, Queries: MAX. Allow query if value of xi can’t be inferred. x1x2x3x4x5x1x2x3x4x5 max(x 1, x 2, x 3, x 4, x 5 ) Ans: 10 10 max(x 1, x 2, x 3, x 4 ) Ans: 8 DENY If, max(x1,x2,x3,x4) = 10 Then, no privacy breach 15Lecture 6 : 590.03 Fall 12

16 Value-based Auditing Data Values: {x 1, x 2, x 3, x 4, x 5 }, Queries: MAX. Allow query if value of xi can’t be inferred. x1x2x3x4x5x1x2x3x4x5 max(x 1, x 2, x 3, x 4, x 5 ) Ans: 10 10 max(x 1, x 2, x 3, x 4 ) Ans: 8 DENY Hence, max(x1,x2,x3,x4) x5 = 10! 16Lecture 6 : 590.03 Fall 12

17 Value-based Auditing Data Values: {x 1, x 2, x 3, x 4, x 5 }, Queries: MAX. Allow query if value of xi can’t be inferred. x1x2x3x4x5x1x2x3x4x5 max(x 1, x 2, x 3, x 4, x 5 ) Ans: 10 10 max(x 1, x 2, x 3, x 4 ) Ans: 8 DENY Hence, max(x1,x2,x3,x4) x5 = 10! Denials leak information. Attack occurred since privacy analysis did not assume that attacker knows the algorithm. 17Lecture 6 : 590.03 Fall 12

18 Simulatable Auditing [Kenthapadi et al PODS ‘05] An auditor is simulatable if the decision to deny a query Q k is made based on information already available to the attacker. – Can use querie s Q 1, Q 2, …, Q k and answers a 1, a 2, …, a k-1 – Cannot use a k or the actual data to make the decision. Denials provably do not leak informaiton – Because the attacker could equivalently determine whether the query would be denied. – Attacker can mimic or simulate the auditor. 18Lecture 6 : 590.03 Fall 12

19 Simulatable Auditing Algorithm Data Values: {x 1, x 2, x 3, x 4, x 5 }, Queries: MAX. Allow query if value of xi can’t be inferred. x1x2x3x4x5x1x2x3x4x5 max(x 1, x 2, x 3, x 4, x 5 ) Ans: 10 10 max(x 1, x 2, x 3, x 4 ) Before computing answer DENY Ans > 10 => not possible Ans = 10 => -∞ ≤ x 1 … x 4 ≤ 10 Ans x 5 = 10 SAFE UNSAFE 19Lecture 6 : 590.03 Fall 12

20 Summary of Simulatable Auditing Decision to deny answers must be based on past queries answered in some (many!) cases. Denials can leak information if the adversary does not know all the information that is used to decide whether to deny the query. 20Lecture 6 : 590.03 Fall 12

21 Outline Simulatable Auditing Minimality Attack in anonymization Simulatable algorithms for anoymization Lecture 6 : 590.03 Fall 1221

22 Minimality attack on Generalization algorithms Algorithms for K-anonymity, L-diversity, T-closeness, etc. try to maximize utility. – Find a minimally generalized table in the lattice that satisfies privacy, and maximizes utility. But … attacker also knows this algorithm! Lecture 6 : 590.03 Fall 1222

23 Example Minimality attack [Wong et al VLDB07] Dataset with one quasi-identifier and 2 values q1, q2. q1, q2 generalize to Q. Sensitive attribute: Cancer – yes/no We want to ensure P[Cancer = yes] < ½. – OK to know if an individual does not have Cancer. Published Table: Lecture 6 : 590.03 Fall 1223 QIDCancer QYes Q QNo Q q2No q2No

24 Which input datasets could have led to the published table? Lecture 6 : 590.03 Fall 1224 QIDCancer QYes Q QNo Q q2No q2No Output dataset {q1,q2}  Q (“2-diverse”) Possible Input dataset 3 occurrences of q1 QIDCancer q1Yes q1Yes q1No q2No q2No q2No QIDCancer q1Yes q1No q1No q2Yes q2No q2No

25 Which input datasets could have led to the published table? Lecture 6 : 590.03 Fall 1225 QIDCancer QYes Q QNo Q q2No q2No Output dataset {q1,q2}  Q (“2-diverse”) Possible Input dataset 3 occurrences of q1 QIDCancer q1Yes QNo Q q2Yes q2No q2No This is a better generalization!

26 Which input datasets could have led to the published table? Lecture 6 : 590.03 Fall 1226 QIDCancer QYes Q QNo Q q2No q2No Output dataset {q1,q2}  Q (“2-diverse”) Possible Input dataset 1 occurrence of q1 QIDCancer q2Yes q1Yes q2No q2No q2No q2No QIDCancer q2Yes q2Yes q1No q2No q2No q2No

27 Which input datasets could have led to the published table? Lecture 6 : 590.03 Fall 1227 QIDCancer QYes Q QNo Q q2No q2No Output dataset {q1,q2}  Q (“2-diverse”) Possible Input dataset 3 occurrences of q1 QIDCancer q2Yes QNo Q q2Yes q2No q2No This is a better generalization!

28 Which input datasets could have led to the published table? Lecture 6 : 590.03 Fall 1228 QIDCancer QYes Q QNo Q q2No q2No Output dataset {q1,q2}  Q (“2-diverse”) Possible Input dataset 3 occurrences of q1 QIDCancer q2Yes QNo Q q2Yes q2No q2No There must be exactly two tuples with q1

29 Which input datasets could have led to the published table? QIDCancer QYes Q QNo Q q2No q2No Output dataset {q1,q2}  Q (“2-diverse”) Possible Input dataset 2 occurrences of q1 QIDCancer q1Yes q1Yes q2No q2No q2No q2No QIDCancer q2Yes q2Yes q1No q1No q2No q2No QIDCancer q1Yes q2Yes q1No q2No q2No q2No Already satisfies privacy 29Lecture 6 : 590.03 Fall 12

30 Which input datasets could have led to the published table? QIDCancer QYes Q QNo Q q2No q2No Output dataset {q1,q2}  Q (“2-diverse”) Possible Input dataset 2 occurrences of q1 QIDCancer q1Yes q1Yes q2No q2No q2No q2No QIDCancer q2Yes q2Yes q1No q1No q2No q2No Learning Cancer=NO is OK, Hence, this is private 30Lecture 6 : 590.03 Fall 12

31 Which input datasets could have led to the published table? QIDCancer QYes Q QNo Q q2No q2No Output dataset {q1,q2}  Q (“2-diverse”) Possible Input dataset 2 occurrences of q1 QIDCancer q1Yes q1Yes q2No q2No q2No q2No This is the ONLY input that results in the output! P[Cancer = yes | q1] = 1 31Lecture 6 : 590.03 Fall 12

32 Outline Simulatable Auditing Minimality Attack in anonymization Transparent Anonymization: Simulatable algorithms for anoymization Lecture 6 : 590.03 Fall 1232

33 Transparent Anonymization Assume that the adversary knows the algorithm that is being used. Lecture 6 : 590.03 Fall 1233 O: Output table I (O, A) : Input tables that result in O due to algorithm A I: All possible input tables

34 Transparent Anonymization According to I (O, A) privacy must be guaranteed. – Probability must be computed assuming I (O,A) is the actual set of all possible input tables. What is an efficient algorithm for Transparent Anonymization? – For L-diversity? Lecture 6 : 590.03 Fall 1234

35 Ace Algorithm [Xiao et al TODS’10] Step 1: Assign Just based on the sensitive values, construct (in a randomized fashion) an intermediate L-diverse generation. Step 2: Split Only based on the quasi-identifier values (and without looking at sensitive values), deterministically refine the intermediate solution to maximize utility. Lecture 6 : 590.03 Fall 1235

36 Step 1: Assign Input Table Lecture 6 : 590.03 Fall 1236

37 Step 1: Assign S t is the set of all tuples (grouped by sensitive value) Iteratively, – Remove α tuples each from the β (≥L) most frequent sensitive values Lecture 6 : 590.03 Fall 1237

38 Step 1: Assign S t is the set of all tuples (grouped by sensitive value) Iteratively, – Remove α tuples each from the β (≥L) most frequent sensitive values – 1 st iteration β=2, α=2 Lecture 6 : 590.03 Fall 1238

39 Step 1: Assign S t is the set of all tuples (grouped by sensitive value) Iteratively, – Remove α tuples each from the β (≥L) most frequent sensitive values – 2 nd iteration β=2, α=1 Lecture 6 : 590.03 Fall 1239

40 Step 1: Assign S t is the set of all tuples (grouped by sensitive value) Iteratively, – Remove α tuples each from the β (≥L) most frequent sensitive values – 3 rd iteration β=2, α=1 Lecture 6 : 590.03 Fall 1240

41 Intermediate Generalization NameAgeZip Ann2110000 Bob2718000 Gill6063000 Ed5460000 Don3235000 Fred6063000 Hera6063000 Cate3235000 Lecture 6 : 590.03 Fall 1241 Disease Dyspepsia Flu Bronchitis Gastritis Diabetes Gastritis

42 Step 2: Split If a bucket contains α>1 tuples of each sensitive value, split it into two buckets, B a and B b s.t., – Pick 1 ≤ α a < α tuples from each sensitive value in bucket B, and put them in bucket B a. The remaining tuples go to B b. – The division (B a, B b ) is optimal in terms of utility. Lecture 6 : 590.03 Fall 1242 NameAgeZip Ann2110000 Bob2718000 Gill6063000 Ed5460000 Don3235000 Fred6063000 Hera6063000 Cate3235000

43 Why does the Ace algorithm satisfy Transparent L-Diversity? According to I (O, A) privacy must be guaranteed. – Probability must be computed assuming I (O,A) is the actual set of all possible input tables. Lecture 6 : 590.03 Fall 1243 O: Output table I (O, A) : Input tables that result in O due to algorithm A I: All possible input tables

44 Ace algorithm analysis Lemma 1: The assign step satisfies transparent L-diversity. Proof (sketch): Consider an intermediate output Int Suppose there is some input table T such that Assign(T) = Int Any other table T’ where the sensitive values of 2 individuals in the same group are swapped, also leads to the same intermediate output Int. Lecture 6 : 590.03 Fall 1244

45 Ace algorithm analysis Lecture 6 : 590.03 Fall 1245 Both tables result in the same intermediate output.

46 Ace algorithm analysis Lemma 1: The assign step satisfies transparent L-diversity. Proof (sketch): Consider an intermediate output Int Suppose there is some input table T such that Assign(T) = Int Any other table T’, where the sensitive values of 2 individuals in the same group are swapped, also leads to the same intermediate output. The set of input tables I(Int,A) contains all possible assignments of diseases to individuals within each group of Int. Lecture 6 : 590.03 Fall 1246

47 Ace algorithm analysis Lemma 1: The assign step satisfies transparent L-diversity. Proof (sketch): The set of table I(Int,A) contains all possible assignments of diseases to individuals in each group of Int. P[Ann has dyspepsia | I (Int,A) and Int] = 1/2 Lecture 6 : 590.03 Fall 1247 NameAgeZip Ann2110000 Bob2718000 Gill6063000 Ed5460000 Disease Dyspepsia Flu

48 Ace algorithm analysis Lemma 2: The split phase also satisfies transparent L-diversity. Proof (sketch): I(Int, Assign) contains all tables where an individual is assigned to an arbitrary sensitive value within the same group in Int. Suppose some input table T ε I(Int, Assign) results in the final output O after Split. Lecture 6 : 590.03 Fall 1248

49 Ace algorithm analysis Split does not depend on the sensitive values. Lecture 6 : 590.03 Fall 1249 Ann Gill Bob Ed dyspepsia flu AnnBob dyspepsia flu GillEd dyspepsia flu results in Bob Ed Ann Gill dyspepsia flu BobAnn dyspepsia flu EdGill dyspepsia flu results in

50 Ace algorithm analysis Lecture 6 : 590.03 Fall 1250 If T ε I(Int, Assign), and it results in O after split, Then, T’ ε I(Int, Assign), and it results in O after split Table TTable T’

51 Ace algorithm analysis Lemma 2: The split phase also satisfies transparent L-diversity. Proof (sketch) Let T’ be generated by “swapping diseases” in some bucket. If T ε I(Int, Assign), and it results in O after split, Then, T’ ε I(Int, Assign), and it results in O after split. For any individual it is equally likely that sensitive value is one of ≥L choices. Therefore, P[individual has disease | I(O, Ace)] < 1/L Lecture 6 : 590.03 Fall 1251

52 Summary Many systems assume privacy/security is guaranteed by assuming the adversary does not know the algorithm. – This is bad … Simulatable algorithms avoid this problem – Ideally choices made by the algorithm should be simulatable by the adversary. Anonymization algorithms are also susceptible to adversaries who know the algorithm or the objective function. Transparent anonymization limits the inference an attacker (who knows the algorithm) can make about sensitive values. Lecture 6 : 590.03 Fall 1252

53 Next Class Composition of privacy Differential Privacy Lecture 6 : 590.03 Fall 1253

54 References A. Machanavajjhala, J. Gehrke, D. Kifer, M. Venkitasubramaniam, “L-Diversity: Privacy beyond k-anonymity”, ICDE 2006 K. Kenthapadi, N. Mishra, K. Nissim, “Simulatable Auditing”, PODS 2005 R. Wong, A. Fu, K. Wang, J. Pei, “Minimality attack in privacy preserving data publishing”, PVLDB 2007 X. Xiao, Y. Tao & N. Koudas, “Transparent Anonymization: Thwarting adversaries who know the algorithm”, TODS 2010 Lecture 6 : 590.03 Fall 1254


Download ppt "Simulatability “The enemy knows the system”, Claude Shannon CompSci 590.03 Instructor: Ashwin Machanavajjhala 1Lecture 6 : 590.03 Fall 12."

Similar presentations


Ads by Google