Download presentation

Presentation is loading. Please wait.

Published byWyatt Burbridge Modified over 2 years ago

1
Simulatability “The enemy knows the system”, Claude Shannon CompSci 590.03 Instructor: Ashwin Machanavajjhala 1Lecture 6 : 590.03 Fall 12

2
Announcements Please meet with me at least 2 times before you finalize your project (deadline Sep 28). Lecture 6 : 590.03 Fall 122

3
Recap – L-Diversity The link between identity and attribute value is the sensitive information. “Does Bob have Cancer? Heart disease? Flu?” “Does Umeko have Cancer? Heart disease? Flu?” Adversary knows ≤ L-2 negation statements. “Umeko does not have Heart Disease.” – Data Publisher may not know exact adversarial knowledge Privacy is breached when identity can be linked to attribute value with high probability Pr[ “Bob has Cancer” | published table, adv. knowledge] > t 3Lecture 6 : 590.03 Fall 12

4
ZipAgeNat. Disease 1306*<=40*Heart 1306*<=40*Flu 1306*<=40*Cancer 1306*<=40*Cancer 1485*>40*Cancer 1485*>40*Heart 1485*>40*Flu 1485*>40*Flu 1305*<=40*Heart 1305*<=40*Flu 1305*<=40*Cancer 1305*<=40*Cancer Recap – 3-Diverse Table 4 L-Diversity Principle: Every group of tuples with the same Q-ID values has ≥ L distinct sensitive values of roughly equal proportions. Lecture 6 : 590.03 Fall 12

5
Outline Simulatable Auditing Minimality Attack in anonymization Simulatable algorithms for anoymization Lecture 6 : 590.03 Fall 125

6
Query Auditing Database has numeric values (say salaries of employees). Database either truthfully answers a question or denies answering. MIN, MAX, SUM queries over subsets of the database. Question: When to allow/deny queries? Database Researcher Query Safe to publish? Yes No 6Lecture 6 : 590.03 Fall 12

7
Why should we deny queries? Q1: Ben’s sensitive value? – DENY Q2: Max sensitive value of males? – ANSWER: 2 Q3: Max sensitive value of 1 st year PhD students? – ANSWER: 3 But Q3 + Q2 => Xi = 3 Lecture 6 : 590.03 Fall 127 Name1 st year PhD GenderSensitiv e value BenYM1 BhaNM1 IosYM1 JanNM2 JianYM2 JieNM1 JoeNM2 MohNM1 SonNF1 XiYF3 YaoNM2

8
Value-Based Auditing Let a 1, a 2, …, a k be the answers to previous queries Q 1, Q 2, …, Q k. Let a k+1 be the answer to Q k+1. a i = f(c i1 x 1, c i2 x 2, …, c in x n ), i = 1 … k+1 c im = 1 if Q i depends on x m Check if any x j has a unique solution. 8Lecture 6 : 590.03 Fall 12

9
Value-based Auditing Data Values: {x 1, x 2, x 3, x 4, x 5 }, Queries: MAX. Allow query if value of xi can’t be inferred. x1x2x3x4x5x1x2x3x4x5 9Lecture 6 : 590.03 Fall 12

10
Value-based Auditing Data Values: {x 1, x 2, x 3, x 4, x 5 }, Queries: MAX. Allow query if value of xi can’t be inferred. x1x2x3x4x5x1x2x3x4x5 max(x 1, x 2, x 3, x 4, x 5 ) Ans: 10 10 -∞ ≤ x 1 … x 5 ≤ 10 10Lecture 6 : 590.03 Fall 12

11
Value-based Auditing Data Values: {x 1, x 2, x 3, x 4, x 5 }, Queries: MAX. Allow query if value of xi can’t be inferred. x1x2x3x4x5x1x2x3x4x5 max(x 1, x 2, x 3, x 4, x 5 ) Ans: 10 10 max(x 1, x 2, x 3, x 4 ) Ans: 8 DENY -∞ ≤ x 1 … x 4 ≤ 8 => x 5 = 10 11Lecture 6 : 590.03 Fall 12

12
Value-based Auditing Data Values: {x 1, x 2, x 3, x 4, x 5 }, Queries: MAX. Allow query if value of xi can’t be inferred. x1x2x3x4x5x1x2x3x4x5 max(x 1, x 2, x 3, x 4, x 5 ) Ans: 10 10 max(x 1, x 2, x 3, x 4 ) Ans: 8 DENY Denial means some value can be compromised! 12Lecture 6 : 590.03 Fall 12

13
Value-based Auditing Data Values: {x 1, x 2, x 3, x 4, x 5 }, Queries: MAX. Allow query if value of xi can’t be inferred. x1x2x3x4x5x1x2x3x4x5 max(x 1, x 2, x 3, x 4, x 5 ) Ans: 10 10 max(x 1, x 2, x 3, x 4 ) Ans: 8 DENY What could max(x1, x2, x3, x4) be? 13Lecture 6 : 590.03 Fall 12

14
Value-based Auditing Data Values: {x 1, x 2, x 3, x 4, x 5 }, Queries: MAX. Allow query if value of xi can’t be inferred. x1x2x3x4x5x1x2x3x4x5 max(x 1, x 2, x 3, x 4, x 5 ) Ans: 10 10 max(x 1, x 2, x 3, x 4 ) Ans: 8 DENY From first answer, max(x1,x2,x3,x4) ≤ 10 14Lecture 6 : 590.03 Fall 12

15
Value-based Auditing Data Values: {x 1, x 2, x 3, x 4, x 5 }, Queries: MAX. Allow query if value of xi can’t be inferred. x1x2x3x4x5x1x2x3x4x5 max(x 1, x 2, x 3, x 4, x 5 ) Ans: 10 10 max(x 1, x 2, x 3, x 4 ) Ans: 8 DENY If, max(x1,x2,x3,x4) = 10 Then, no privacy breach 15Lecture 6 : 590.03 Fall 12

16
Value-based Auditing Data Values: {x 1, x 2, x 3, x 4, x 5 }, Queries: MAX. Allow query if value of xi can’t be inferred. x1x2x3x4x5x1x2x3x4x5 max(x 1, x 2, x 3, x 4, x 5 ) Ans: 10 10 max(x 1, x 2, x 3, x 4 ) Ans: 8 DENY Hence, max(x1,x2,x3,x4) x5 = 10! 16Lecture 6 : 590.03 Fall 12

17
Value-based Auditing Data Values: {x 1, x 2, x 3, x 4, x 5 }, Queries: MAX. Allow query if value of xi can’t be inferred. x1x2x3x4x5x1x2x3x4x5 max(x 1, x 2, x 3, x 4, x 5 ) Ans: 10 10 max(x 1, x 2, x 3, x 4 ) Ans: 8 DENY Hence, max(x1,x2,x3,x4) x5 = 10! Denials leak information. Attack occurred since privacy analysis did not assume that attacker knows the algorithm. 17Lecture 6 : 590.03 Fall 12

18
Simulatable Auditing [Kenthapadi et al PODS ‘05] An auditor is simulatable if the decision to deny a query Q k is made based on information already available to the attacker. – Can use querie s Q 1, Q 2, …, Q k and answers a 1, a 2, …, a k-1 – Cannot use a k or the actual data to make the decision. Denials provably do not leak informaiton – Because the attacker could equivalently determine whether the query would be denied. – Attacker can mimic or simulate the auditor. 18Lecture 6 : 590.03 Fall 12

19
Simulatable Auditing Algorithm Data Values: {x 1, x 2, x 3, x 4, x 5 }, Queries: MAX. Allow query if value of xi can’t be inferred. x1x2x3x4x5x1x2x3x4x5 max(x 1, x 2, x 3, x 4, x 5 ) Ans: 10 10 max(x 1, x 2, x 3, x 4 ) Before computing answer DENY Ans > 10 => not possible Ans = 10 => -∞ ≤ x 1 … x 4 ≤ 10 Ans x 5 = 10 SAFE UNSAFE 19Lecture 6 : 590.03 Fall 12

20
Summary of Simulatable Auditing Decision to deny answers must be based on past queries answered in some (many!) cases. Denials can leak information if the adversary does not know all the information that is used to decide whether to deny the query. 20Lecture 6 : 590.03 Fall 12

21
Outline Simulatable Auditing Minimality Attack in anonymization Simulatable algorithms for anoymization Lecture 6 : 590.03 Fall 1221

22
Minimality attack on Generalization algorithms Algorithms for K-anonymity, L-diversity, T-closeness, etc. try to maximize utility. – Find a minimally generalized table in the lattice that satisfies privacy, and maximizes utility. But … attacker also knows this algorithm! Lecture 6 : 590.03 Fall 1222

23
Example Minimality attack [Wong et al VLDB07] Dataset with one quasi-identifier and 2 values q1, q2. q1, q2 generalize to Q. Sensitive attribute: Cancer – yes/no We want to ensure P[Cancer = yes] < ½. – OK to know if an individual does not have Cancer. Published Table: Lecture 6 : 590.03 Fall 1223 QIDCancer QYes Q QNo Q q2No q2No

24
Which input datasets could have led to the published table? Lecture 6 : 590.03 Fall 1224 QIDCancer QYes Q QNo Q q2No q2No Output dataset {q1,q2} Q (“2-diverse”) Possible Input dataset 3 occurrences of q1 QIDCancer q1Yes q1Yes q1No q2No q2No q2No QIDCancer q1Yes q1No q1No q2Yes q2No q2No

25
Which input datasets could have led to the published table? Lecture 6 : 590.03 Fall 1225 QIDCancer QYes Q QNo Q q2No q2No Output dataset {q1,q2} Q (“2-diverse”) Possible Input dataset 3 occurrences of q1 QIDCancer q1Yes QNo Q q2Yes q2No q2No This is a better generalization!

26
Which input datasets could have led to the published table? Lecture 6 : 590.03 Fall 1226 QIDCancer QYes Q QNo Q q2No q2No Output dataset {q1,q2} Q (“2-diverse”) Possible Input dataset 1 occurrence of q1 QIDCancer q2Yes q1Yes q2No q2No q2No q2No QIDCancer q2Yes q2Yes q1No q2No q2No q2No

27
Which input datasets could have led to the published table? Lecture 6 : 590.03 Fall 1227 QIDCancer QYes Q QNo Q q2No q2No Output dataset {q1,q2} Q (“2-diverse”) Possible Input dataset 3 occurrences of q1 QIDCancer q2Yes QNo Q q2Yes q2No q2No This is a better generalization!

28
Which input datasets could have led to the published table? Lecture 6 : 590.03 Fall 1228 QIDCancer QYes Q QNo Q q2No q2No Output dataset {q1,q2} Q (“2-diverse”) Possible Input dataset 3 occurrences of q1 QIDCancer q2Yes QNo Q q2Yes q2No q2No There must be exactly two tuples with q1

29
Which input datasets could have led to the published table? QIDCancer QYes Q QNo Q q2No q2No Output dataset {q1,q2} Q (“2-diverse”) Possible Input dataset 2 occurrences of q1 QIDCancer q1Yes q1Yes q2No q2No q2No q2No QIDCancer q2Yes q2Yes q1No q1No q2No q2No QIDCancer q1Yes q2Yes q1No q2No q2No q2No Already satisfies privacy 29Lecture 6 : 590.03 Fall 12

30
Which input datasets could have led to the published table? QIDCancer QYes Q QNo Q q2No q2No Output dataset {q1,q2} Q (“2-diverse”) Possible Input dataset 2 occurrences of q1 QIDCancer q1Yes q1Yes q2No q2No q2No q2No QIDCancer q2Yes q2Yes q1No q1No q2No q2No Learning Cancer=NO is OK, Hence, this is private 30Lecture 6 : 590.03 Fall 12

31
Which input datasets could have led to the published table? QIDCancer QYes Q QNo Q q2No q2No Output dataset {q1,q2} Q (“2-diverse”) Possible Input dataset 2 occurrences of q1 QIDCancer q1Yes q1Yes q2No q2No q2No q2No This is the ONLY input that results in the output! P[Cancer = yes | q1] = 1 31Lecture 6 : 590.03 Fall 12

32
Outline Simulatable Auditing Minimality Attack in anonymization Transparent Anonymization: Simulatable algorithms for anoymization Lecture 6 : 590.03 Fall 1232

33
Transparent Anonymization Assume that the adversary knows the algorithm that is being used. Lecture 6 : 590.03 Fall 1233 O: Output table I (O, A) : Input tables that result in O due to algorithm A I: All possible input tables

34
Transparent Anonymization According to I (O, A) privacy must be guaranteed. – Probability must be computed assuming I (O,A) is the actual set of all possible input tables. What is an efficient algorithm for Transparent Anonymization? – For L-diversity? Lecture 6 : 590.03 Fall 1234

35
Ace Algorithm [Xiao et al TODS’10] Step 1: Assign Just based on the sensitive values, construct (in a randomized fashion) an intermediate L-diverse generation. Step 2: Split Only based on the quasi-identifier values (and without looking at sensitive values), deterministically refine the intermediate solution to maximize utility. Lecture 6 : 590.03 Fall 1235

36
Step 1: Assign Input Table Lecture 6 : 590.03 Fall 1236

37
Step 1: Assign S t is the set of all tuples (grouped by sensitive value) Iteratively, – Remove α tuples each from the β (≥L) most frequent sensitive values Lecture 6 : 590.03 Fall 1237

38
Step 1: Assign S t is the set of all tuples (grouped by sensitive value) Iteratively, – Remove α tuples each from the β (≥L) most frequent sensitive values – 1 st iteration β=2, α=2 Lecture 6 : 590.03 Fall 1238

39
Step 1: Assign S t is the set of all tuples (grouped by sensitive value) Iteratively, – Remove α tuples each from the β (≥L) most frequent sensitive values – 2 nd iteration β=2, α=1 Lecture 6 : 590.03 Fall 1239

40
Step 1: Assign S t is the set of all tuples (grouped by sensitive value) Iteratively, – Remove α tuples each from the β (≥L) most frequent sensitive values – 3 rd iteration β=2, α=1 Lecture 6 : 590.03 Fall 1240

41
Intermediate Generalization NameAgeZip Ann2110000 Bob2718000 Gill6063000 Ed5460000 Don3235000 Fred6063000 Hera6063000 Cate3235000 Lecture 6 : 590.03 Fall 1241 Disease Dyspepsia Flu Bronchitis Gastritis Diabetes Gastritis

42
Step 2: Split If a bucket contains α>1 tuples of each sensitive value, split it into two buckets, B a and B b s.t., – Pick 1 ≤ α a < α tuples from each sensitive value in bucket B, and put them in bucket B a. The remaining tuples go to B b. – The division (B a, B b ) is optimal in terms of utility. Lecture 6 : 590.03 Fall 1242 NameAgeZip Ann2110000 Bob2718000 Gill6063000 Ed5460000 Don3235000 Fred6063000 Hera6063000 Cate3235000

43
Why does the Ace algorithm satisfy Transparent L-Diversity? According to I (O, A) privacy must be guaranteed. – Probability must be computed assuming I (O,A) is the actual set of all possible input tables. Lecture 6 : 590.03 Fall 1243 O: Output table I (O, A) : Input tables that result in O due to algorithm A I: All possible input tables

44
Ace algorithm analysis Lemma 1: The assign step satisfies transparent L-diversity. Proof (sketch): Consider an intermediate output Int Suppose there is some input table T such that Assign(T) = Int Any other table T’ where the sensitive values of 2 individuals in the same group are swapped, also leads to the same intermediate output Int. Lecture 6 : 590.03 Fall 1244

45
Ace algorithm analysis Lecture 6 : 590.03 Fall 1245 Both tables result in the same intermediate output.

46
Ace algorithm analysis Lemma 1: The assign step satisfies transparent L-diversity. Proof (sketch): Consider an intermediate output Int Suppose there is some input table T such that Assign(T) = Int Any other table T’, where the sensitive values of 2 individuals in the same group are swapped, also leads to the same intermediate output. The set of input tables I(Int,A) contains all possible assignments of diseases to individuals within each group of Int. Lecture 6 : 590.03 Fall 1246

47
Ace algorithm analysis Lemma 1: The assign step satisfies transparent L-diversity. Proof (sketch): The set of table I(Int,A) contains all possible assignments of diseases to individuals in each group of Int. P[Ann has dyspepsia | I (Int,A) and Int] = 1/2 Lecture 6 : 590.03 Fall 1247 NameAgeZip Ann2110000 Bob2718000 Gill6063000 Ed5460000 Disease Dyspepsia Flu

48
Ace algorithm analysis Lemma 2: The split phase also satisfies transparent L-diversity. Proof (sketch): I(Int, Assign) contains all tables where an individual is assigned to an arbitrary sensitive value within the same group in Int. Suppose some input table T ε I(Int, Assign) results in the final output O after Split. Lecture 6 : 590.03 Fall 1248

49
Ace algorithm analysis Split does not depend on the sensitive values. Lecture 6 : 590.03 Fall 1249 Ann Gill Bob Ed dyspepsia flu AnnBob dyspepsia flu GillEd dyspepsia flu results in Bob Ed Ann Gill dyspepsia flu BobAnn dyspepsia flu EdGill dyspepsia flu results in

50
Ace algorithm analysis Lecture 6 : 590.03 Fall 1250 If T ε I(Int, Assign), and it results in O after split, Then, T’ ε I(Int, Assign), and it results in O after split Table TTable T’

51
Ace algorithm analysis Lemma 2: The split phase also satisfies transparent L-diversity. Proof (sketch) Let T’ be generated by “swapping diseases” in some bucket. If T ε I(Int, Assign), and it results in O after split, Then, T’ ε I(Int, Assign), and it results in O after split. For any individual it is equally likely that sensitive value is one of ≥L choices. Therefore, P[individual has disease | I(O, Ace)] < 1/L Lecture 6 : 590.03 Fall 1251

52
Summary Many systems assume privacy/security is guaranteed by assuming the adversary does not know the algorithm. – This is bad … Simulatable algorithms avoid this problem – Ideally choices made by the algorithm should be simulatable by the adversary. Anonymization algorithms are also susceptible to adversaries who know the algorithm or the objective function. Transparent anonymization limits the inference an attacker (who knows the algorithm) can make about sensitive values. Lecture 6 : 590.03 Fall 1252

53
Next Class Composition of privacy Differential Privacy Lecture 6 : 590.03 Fall 1253

54
References A. Machanavajjhala, J. Gehrke, D. Kifer, M. Venkitasubramaniam, “L-Diversity: Privacy beyond k-anonymity”, ICDE 2006 K. Kenthapadi, N. Mishra, K. Nissim, “Simulatable Auditing”, PODS 2005 R. Wong, A. Fu, K. Wang, J. Pei, “Minimality attack in privacy preserving data publishing”, PVLDB 2007 X. Xiao, Y. Tao & N. Koudas, “Transparent Anonymization: Thwarting adversaries who know the algorithm”, TODS 2010 Lecture 6 : 590.03 Fall 1254

Similar presentations

OK

No Free Lunch in Data Privacy CompSci 590.03 Instructor: Ashwin Machanavajjhala 1Lecture 15: 590.03 Fall 12.

No Free Lunch in Data Privacy CompSci 590.03 Instructor: Ashwin Machanavajjhala 1Lecture 15: 590.03 Fall 12.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google