Presentation is loading. Please wait.

Presentation is loading. Please wait.

Center for Secure Information Systems Concordia Institute for Information Systems Engineering k-Jump Strategy for Preserving Privacy in Micro-Data Disclosure.

Similar presentations


Presentation on theme: "Center for Secure Information Systems Concordia Institute for Information Systems Engineering k-Jump Strategy for Preserving Privacy in Micro-Data Disclosure."— Presentation transcript:

1 Center for Secure Information Systems Concordia Institute for Information Systems Engineering k-Jump Strategy for Preserving Privacy in Micro-Data Disclosure Wen Ming Liu 1, Lingyu Wang 1, and Lei Zhang 2 1 Concordia University 2 George Mason University ICDT 2010 CIISE / CSIS March 23, 2010

2 Agenda 2  Background  K-Jump Strategy  Data Utility Comparison  Conclusion

3 Agenda 3  Background  K-Jump Strategy  Data Utility Comparison  Conclusion  Example  Algorithm a naive and a safe  Example  Algorithm a naive and a safe

4 Data Holder’s View 4 Example

5 Example – Data Holder’s View NameDoBCondition Alice1990flu Bob1985cold Charlie1974cancer David1962cancer Eve1953headache Fen1941toothache Micro-Data Table t 0 5 DoBCondition 1970~ ~1969 Generalization g 2 (t 0 ) Goal: Release table to satisfy 2-diversity generalization  Name: identifier.  DoB: quasi-identifier.  Condition: sensitive attribute.  Name: identifier.  DoB: quasi-identifier.  Condition: sensitive attribute. Data Holder DoBCondition 1980~ ~ ~1959 Generalization g 1 (t 0 ) generalization algorithm: considering generalization function g 1 and then g 2 in order Goal: Release table to satisfy 2-diversity DoBCondition 1980~ ~ ~1959 generalization function g 1 () DoBCondition 1970~ ~1969 generalization function g 2 () Condition flu cold cancer headache toothache 2-diversity? Condition flu cold cancer headache toothache generalization Released! DoBCondition 1970~1999flu cold cancer 1940~1969cancer headache toothache Generalization g 2 (t 0 ) Released!

6 Adversary’s View 6 Example (cont.)

7 Example (cont.) – Adversary’s View NameDoBCondition Alice1990 Bob1985 Charlie1974 David1962 Eve1953 Fen1941 Public Knowledge 7 DoBCondition 1970~1999flu cold cancer 1940~1969cancer headache toothache Released Generalization g 2 (t 0 )  Name: identifier.  DoB: quasi-identifier.  Condition: sensitive attribute.  Name: identifier.  DoB: quasi-identifier.  Condition: sensitive attribute. permutation set What can adversary infer? Adversary NameDoBCondition Alice1990??? Bob1985??? Charlie1974??? David1962??? Eve1953??? Fen1941??? Unknown Micro-Data Table t 0 t1t1 A flu B col C can D E hac F tac t2t2 flu can col can hac tac t3t3 col flu can hac tac t4t4 col can flu can hac tac … … … … … … … t 35 can flu col tac hac can t 36 can col flu tac hac can Attacker knows:  generalization  public knowledge  privacy property Attacker knows:  generalization  public knowledge  privacy property Goal: Guess what is the micro-data DoBCondition 1970~1999flu cold cancer 1940~1969cancer headache toothache Released Generalization g 2 (t 0 ) The three persons in each group may have the three conditions in any given order.

8 This would be the adversary’s best guesses of the micro-data table, if the released generalization is his/her only knowledge, However … 8 t1t1 A flu B col C can D E hac F tac t2t2 flu can col can hac tac t3t3 col flu can hac tac t4t4 col can flu can hac tac … … … … … … … t 35 can flu col tac hac can t 36 can col flu tac hac can permutation set Example (cont.)

9 Example (cont.) – Adversary Simulating the Algorithm 9 However, adversary also knows the generalization algorithm, and can simulate the algorithm to further exclude some invalid guesses.

10 DoBCondition 1980~1999??? 1960~1979??? 1940~1959??? Generalization g 1 (t i ) NameDoBCondition Alice1990??? Bob1985??? Charlie1974??? David1962??? Eve1953??? Fen1941??? Possible Table t i Example (cont.) – Adversary Simulating the Algorithm NameDoBCondition Alice1990??? Bob1985??? Charlie1974??? David1962??? Eve1953??? Fen1941??? Unknown Micro-Data Table t 0 10 DoBCondition 1970~1999flu cold cancer 1940~1969cancer headache toothache Released Generalization g 2 (t 0 ) DoBCondition 1980~1999??? 1960~1979??? 1940~1959??? Checked but unused Generalization g 1 (t 0 ) disclosure set permutation set t1t1 flu cold cancer headache toothache t1t1 flu cold cancer headache toothache Violate privacy! Satisfy privacy! t2t2 flu cancer cold cancer headache toothache t2t2 flu cancer cold cancer headache toothache t3t3 cold flu cancer headache toothache t3t3 cold flu cancer headache toothache t4t4 cold cancer flu cancer headache toothache t4t4 cold cancer flu cancer headache toothache … … … … … … … … … … … … … … t 35 cancer flu cold toothache headache cancer t 35 cancer flu cold toothache headache cancer t 36 cancer cold flu toothache headache cancer t 36 cancer cold flu toothache headache cancer NameDoB Alice1990 Bob1985 Charlie1974 David1962 Eve1953 Fen1941 t1t1 flu cold cancer headache toothache t3t3 cold flu cancer headache toothache t7t7 flu cold cancer toothache headache t9t9 cold flu cancer toothache headache t1t1 flu col can hac tac t2t2 flu can col can hac tac A B C D E F t3t3 col flu can hac tac t4t4 col can flu can hac tac … … … … … … … t 35 can flu col tac hac can t 36 can col flu tac hac can Is this the valid guess of the micro-data table? Let’s try to check it using the algorithm! Simulating the algorithm Mental image

11 Decision Process of Safe and Unsafe Algorithms per 1 g1g1 g2g2 t0t0 g 1 (t 0 ) Y N per 2 g 2 (t 0 ) Y N gigi per i g i (t 0 ) Y N gngn per n g n (t 0 ) Y N...  ds 1 per 1 g1g1 g2g2 t0t0 g 1 (t 0 ) Y N ds 2 per 2 g 2 (t 0 ) Y N gigi ds i per i g i (t 0 ) Y N gngn ds n per n g n (t 0 ) Y N...  a naive a safe 11 evaluation path box: the i th iteration diamond: an evaluation of the privacy property per: permutation set ds: disclosure set Most existing generalization algorithms (without considering this problem): Evaluate the permutation set. (Adversary’s mental image of the micro- data table without the knowledge about the algorithm) Evaluate the permutation set. (Adversary’s mental image of the micro- data table without the knowledge about the algorithm) Safe generalization algorithms (Zhang’07ccs, ….) Evaluate the disclosure set, instead. (Adversary’s mental image of the micro- data table after simulating the algorithm) Evaluate the disclosure set, instead. (Adversary’s mental image of the micro- data table after simulating the algorithm)

12 Agenda 12  Background  Data Utility Comparison  Conclusion  The Algorithm Family a jump ( k )  Properties of a jump ( k )  The Algorithm Family a jump ( k )  Properties of a jump ( k )  K-Jump Strategy

13 The Algorithm Family a jump (k) ds 1 per 1 g1g1 g2g2 t0t0 g 1 (t 0 ) Y N g 2 (t 0 ) N g 2+k g 2+k (t 0 ) N gngn g n (t 0 ) N...  a jump (k) Y ds 2 per 2 Y Y ds 2+k per 2+k Y Y ds n per n Y Y NN  N 13 naive strategy : evaluate privacy property on permutation set only safe strategy : evaluate privacy property on disclosure set directly k-jump strategy: penalize by jumping over the next k-1 iterations naive strategy: efficient but unsafe safe strategy : safe but costly

14 Properties of a jump (k) 14  Computation of the disclosure set  ds(g 1 (t 0 )) and ds(g 2 (t 0 ))  Size of the family  a safe : to compute ds(g i (t 0 )), must first compute ds(g j (t)) for all t in per(g i (t 0 )) and j=1,2, …,i-1 ds 1 per 1 g1g1 g2g2 t0t0 g 1 (t 0 ) Y N g 2 (t 0 ) N g 2+k g 2+k (t 0 ) N gngn g n (t 0 ) N...  a jump (k) Y ds 2 per 2 Y Y ds 2+k per 2+k Y Y ds n per n Y Y NN  N  a jump : to compute ds(g i (t 0 )) (2

15 Agenda 15  Background  Conclusion  Construction for Theorem 1: 1-jump and i-jump (1

16 Construction for Theorem1: 1-jump and i-jump (1

17 Construction for Theorem1 (cont.) : 1-jump and i-jump (1

18 Construction for Theorem1 (cont.) : 1-jump and i-jump (1

19 19 QIDg1g1 g2g2 g3g3 … AC0C0 C0C0 C0C0 … BC1C1 C1C1 C1C1 … CC2C2 C2C2 C2C2 … DC3C3 C3C3 C3C3 … EC4C4 C4C4 C4C4 … FC5C5 C5C5 C5C5 … GC6C6 C6C6 C6C6 … HC6C6 C6C6 C6C6 … IC6C6 C6C6 C6C6 … JC7C7 C7C7 C7C7 … KC7C7 C7C7 C7C7 … LC8C8 C8C8 C8C8 … MC8C8 C8C8 C8C8 … NC9C9 C9C9 C9C9 … OC9C9 C9C9 C9C9 … privacy property : highest ratio of a sensitive value in a group must be no greater than 1/2 To compute ds 3 k (t 0 ): S1S1 S 111 S 112 S 113 AC0C0 C0C0 C0C0 C0C0 BC1C1 C1C1 C1C1 C1C1 CC2C2 C2C2 C2C2 C2C2 DC3C3 C3C3 C3C3 C3C3 EC4C4 C4C4 C4C4 C4C4 FC5C5 C5C5 C5C5 C5C5 GC6C6 C6C6 C6C6 C6C6 HC6C6 C6C6 C6C6 C6C6 IC6C6  C6C6 C6C6 C6C6 JC7C7 C7C7 C7C7 C7C7 KC7C7 C8C8 C8C8 C7C7 LC8C8 C9C9 C8C8 C8C8 MC8C8 C9C9 C9C9 C9C9 NC9C9 C7C7 C7C7 C8C8 OC9C9 C8C8 C9C9 C9C9 # |S 1 \S 1 ’|=3456 Excluding any table t for which p(per 1 (t))=true Considering generalizing these tables using g b. For a jump (i),all tables in S 1 \S 1 ’ will be excluded from ds 3 i (t 0 ). Satisfied! Construction for Theorem1 (cont.) : 1-jump and i-jump (1

20 20 QIDg1g1 g2g2 g3g3 … AC0C0 C0C0 C0C0 … BC1C1 C1C1 C1C1 … CC2C2 C2C2 C2C2 … DC3C3 C3C3 C3C3 … EC4C4 C4C4 C4C4 … FC5C5 C5C5 C5C5 … GC6C6 C6C6 C6C6 … HC6C6 C6C6 C6C6 … IC6C6 C6C6 C6C6 … JC7C7 C7C7 C7C7 … KC7C7 C7C7 C7C7 … LC8C8 C8C8 C8C8 … MC8C8 C8C8 C8C8 … NC9C9 C9C9 C9C9 … OC9C9 C9C9 C9C9 … privacy property : highest ratio of a sensitive value in a group must be no greater than 1/2 To compute ds 3 k (t 0 ): S1S1 S 111 S 1111 S 1112 AC0C0 C0C0 C0C0 C0C0 BC1C1 C1C1 C1C1 C1C1 CC2C2 C2C2 C2C2 C2C2 DC3C3 C3C3 C3C3 C3C3 EC4C4 C4C4 C4C4 C 4 /C 5 FC5C5 C5C5 C5C5 C6C6 GC6C6 C6C6 C6C6 C6C6 HC6C6 C6C6 C6C6 IC6C6 C6C6  C6C6 C6C6 JC7C7 C7C7 C7C7 C7C7 KC7C7 C8C8 C8C8 C8C8 LC8C8 C9C9 C9C9 C9C9 MC8C8 C9C9 C9C9 C9C9 NC9C9 C7C7 C7C7 C7C7 OC9C9 C8C8 C8C8 C8C8 # Excluding any table t for which p(per 1 (t))=true Considering generalizing these tables using g c. For a jump (1),the disclosure set of all tables in S 1 \S 1 ’ under g 2 do not satisfy the privacy property. The ratio of I being associated with C 6 is 5/9. Violated! Construction for Theorem1 (cont.) : 1-jump and i-jump (1

21 21 Show the evaluation paths by figures. Construction for Theorem2: i-jump and j-jump (1

22 22 g1g1 g2g2 g3g3 …gjgj g j+1 g j+2 … C0C0 C0C0 C0C0 …C0C0 C0C0 C0C0 … C1C1 C1C1 C1C1 …C1C1 C1C1 C1C1 … C2C2 C2C2 C2C2 …C2C2 C2C2 C2C2 … C3C3 C3C3 C3C3 …C3C3 C3C3 C3C3 … C4C4 C4C4 C4C4 …C4C4 C4C4 C4C4 … SSS…SSS… SSS…SSS… C5C5 C5C5 C5C5 …C5C5 C5C5 C5C5 … C6C6 C6C6 C6C6 …C6C6 C6C6 C6C6 … C7C7 C7C7 C7C7 …C7C7 C7C7 C7C7 … C8C8 C8C8 C8C8 …C8C8 C8C8 C8C8 … C9C9 C9C9 C9C9 …C9C9 C9C9 C9C9 … ……………………  The case where i-jump has better utility than j-jump is relatively easier to construct. We only show the construction for the other case.  For this construction, generalization g j+2 will be released for j-jump, while g j+i+1 or after will be released for i-jump. Construction for Theorem2 (cont.) : i-jump and j-jump (1

23 23 Construction for Theorem3: K 1 -jump and K 2 -jump (K 1,K 2 :vectors) incomparable

24 24 QIDg1g1 g2g2 g3g3 g2'g2' AC1C1 C1C1 C1C1 C1C1 BC2C2 C2C2 C2C2 C2C2 CC3C3 C3C3 C3C3 C3C3 DC4C4 C4C4 C4C4 C4C4 EC5C5 C5C5 C5C5 C5C5 FC3C3 C3C3 C3C3 C3C3 GC3C3 C3C3 C3C3 C3C3 Cannot be disclosed under g 1 (.) or g 3 (.). 1 The table will lead to disclosing nothing! g2g2 S1S1 S2S2 S3S3 AC1C1 C1C1 C 1 /C 2 BC2C2 C2C2 C3C3 C3C3 CC3C3 C3C3 DC4C4  C4C4 C3C3 C4C4 EC5C5 C5C5 C3C3 C5C5 FC3C3 C3C3 C4C4 C3C3 GC3C3 C3C3 C5C5 C3C3 #  the jump distance is 1;  the privacy property: highest ratio of a sensitive value in a group must be no greater than ½. Without reusing g 2 : To compute ds 2 : 2 Belongs to one of the three disjoint sets. Violated! Construction for proposition2: Reusing generalization functions

25 25 QIDg1g1 g2g2 g3g3 g2'g2' AC1C1 C1C1 C1C1 C1C1 BC2C2 C2C2 C2C2 C2C2 CC3C3 C3C3 C3C3 C3C3 DC4C4 C4C4 C4C4 C4C4 EC5C5 C5C5 C5C5 C5C5 FC3C3 C3C3 C3C3 C3C3 GC3C3 C3C3 C3C3 C3C3 To calculate ds 2 ’, the tables can be disclosed under g 1, g 2, and g 3 must be excluded from per 2 ’ g3g3 S1S1 S2S2 S3S3 AC1C1 C1C1 C 1 /C 2 BC2C2 C2C2 C3C3 C3C3 CC3C3 C3C3 DC4C4  C4C4 C3C3 C4C4 EC5C5 C5C5 C3C3 C5C5 FC3C3 C3C3 C4C4 C3C3 GC3C3 C3C3 C5C5 C3C3 #  the jump distance is 1;  the privacy property: highest ratio of a sensitive value in a group must be no greater than ½. g 2 is reused as g 2 ’: S 1,S 2, and S 3 cannot be disclosed under g 2, as mentioned above. 1 S 2 and S 3 cannot be disclosed under g 3. 2 Construction for proposition2 (cont.) : Reusing generalization functions

26 26 QIDg1g1 g2g2 g3g3 g2'g2' AC1C1 C1C1 C1C1 C1C1 BC2C2 C2C2 C2C2 C2C2 CC3C3 C3C3 C3C3 C3C3 DC4C4 C4C4 C4C4 C4C4 EC5C5 C5C5 C5C5 C5C5 FC3C3 C3C3 C3C3 C3C3 GC3C3 C3C3 C3C3 C3C3 To caculate ds 2 ’, the tables can be disclosed under g 1, g 2, and g 3 must be excluded from per 2 ’ S1S1 S 11 S 12 S 13 AC1C1 C1C1 C1C1 C1C1 BC2C2 C2C2 C2C2 C2C2 CC3C3 C3C3 C3C3 C3C3 DC4C4  C3C3 C3C3 C4C4 EC5C5 C 4 /C 5 C3C3 C5C5 FC3C3 C3C3 C4C4 C3C3 GC3C3 C5C5 C3C3 #  the jump distance is 1;  the privacy property: highest ratio of a sensitive value in a group must be no greater than ½. g 2 is reused as g 2 ’: S 1,S 2, and S 3 cannot be disclosed under g 2, as mentioned above. 1 S 2 and S 3 cannot be disclosed under g 3. 2 S 1 can be further divided into three disjoint subsets 3 a. S 12 and S 13 cannot be disclosed under g 3. Construction for proposition2 (cont.) : Reusing generalization functions

27 27 QIDg1g1 g2g2 g3g3 g2'g2' AC1C1 C1C1 C1C1 C1C1 BC2C2 C2C2 C2C2 C2C2 CC3C3 C3C3 C3C3 C3C3 DC4C4 C4C4 C4C4 C4C4 EC5C5 C5C5 C5C5 C5C5 FC3C3 C3C3 C3C3 C3C3 GC3C3 C3C3 C3C3 C3C3 To caculate ds 2 ’, the tables can be disclosed under g 1, g 2, and g 3 must be excluded from per 2 ’ S1S1 S 11 tAtA S A1 S A2 AC1C1 C1C1 C1C1 C3C3 C 1 /C 2 /C 4 BC2C2 C2C2 C2C2 C3C3 CC3C3 C3C3 C3C3 C1C1 C3C3 DC4C4 C3C3  C3C3 C2C2 C3C3 EC5C5 C 4 /C 5 C4C4 C4C4 C 1 /C 2 /C 4 FC3C3 C3C3 C3C3 C3C3 C3C3 GC3C3 C 4 /C 5 C5C5 C5C5 C5C5 # g 2 is reused as g 2 ’: S 1,S 2, and S 3 cannot be disclosed under g 2, as mentioned above. 1 S 2 and S 3 cannot be disclosed under g 3. 2 S 1 can be further divided into three disjoint subsets 3 b. The tables in subset S 11 can be disclosed under g 3. To compute ds 3 (t 0 in S 11 ): Excluding any table t for which p(per 1 (t))=true A Belongs to one of the two disjoint sets (nor under g 2 ). These subsets cannot be disclosed under g 2. B one instance Construction for proposition2 (cont.) : Reusing generalization functions

28 28 QIDg1g1 g2g2 g3g3 g2'g2' AC1C1 C1C1 C1C1 C1C1 BC2C2 C2C2 C2C2 C2C2 CC3C3 C3C3 C3C3 C3C3 DC4C4 C4C4 C4C4 C4C4 EC5C5 C5C5 C5C5 C5C5 FC3C3 C3C3 C3C3 C3C3 GC3C3 C3C3 C3C3 C3C3 S 12 S 13 S2S2 S3S3 AC1C1 C1C1 C 1 /C 2 BC2C2 C2C2 C3C3 C3C3 CC3C3 C3C3 DC3C3 C4C4 C3C3 C4C4 EC3C3 C5C5 C3C3 C5C5 FC4C4 C3C3 C4C4 C3C3 GC5C5 C3C3 C5C5 C3C3 #4488  the jump distance is 1;  the privacy property: highest ratio of a sensitive value in a group must be no greater than ½. g 2 is reused as g 2 ’: The ratio of D and E being associated with C 3 are 0.5, which is the highest ratio. Satisfied! Construction for proposition2 (cont.) : Reusing generalization functions

29 Results on a safe and a jump (1) 29  Lemma 3: p(per(t 0 ))=false  p(any of its subsets)=false  Corollary 1: The algorithm a safe has the same data utility as a jump (1) 1.When the privacy property is: either set-monotonic or based on the highest ratio of sensitive values  Lemma 4: The ds 3 under a safe is a subset of that under a jump (1)  Theorem 5: The data utility of a safe and a jump (1) is generally incomparable. 2. When the privacy property is other cases:

30 Agenda 30  Background  K-Jump Strategy  Data Utility Comparison  Conclusion

31 Conclusion 31  We have proposed a novel k-jump strategy for micro-data disclosure.  Transform a given generalization algorithm into a large number of safe algorithms.  Show the data utility is generally incomparable by constructing counter-examples.  Practical impact: make a secret choice.

32 Further Result and Future Work 32  Future studies:  Study more efficient safe algorithms.  Employ statistical methods to compare different k-jump algorithms..  Further investigate the opportunity in reusing generalization functions.  Further Results in the extended version of this paper:  Computational complexity:  Making a secret choice among unsafe algorithms does not yield a safe solution.

33 Thank you! 33

34 Example – Data Holder View NameDoBCondition Alice1990flu Bob1985cold Charlie1974cancer David1962cancer Eve1953headache Fen1941toothache Micro-Data Table t 0 34 DoBCondition 1970~ ~1969 Generalization g 2 (t 0 ) Goal: Release table to satisfy 2-diversity generalization  Name: identifier.  DoB: quasi-identifier.  Condition: sensitive attribute.  Name: identifier.  DoB: quasi-identifier.  Condition: sensitive attribute. Data Holder DoBCondition 1980~ ~ ~1959 Generalization g 1 (t 0 ) generalization algorithm: considering generalization function g 1 and then g 2 in order Goal: Release table to satisfy 2-diversity DoBCondition 1980~ ~ ~1959 generalization function g 1 () DoBCondition 1970~ ~1969 generalization function g 2 () Condition flu cold cancer headache toothache 2-diversity? Condition flu cold cancer headache toothache generalization

35 Toy Example NameDoBCondition Alice1990flu Bob1985cold Charlie1974cancer David1962cancer Eve1953headache Fen1941toothache Micro-Data Table t 0 35 DoBCondition 1970~1999flu cold cancer 1940~1969cancer headache toothache Generalization g 2 (t 0 ) 2-diversity generalized  Name: identifier.  DoB: quasi-identifier.  Condition: sensitive attribute.  Name: identifier.  DoB: quasi-identifier.  Condition: sensitive attribute. permutation set What can attacker infer? Data Holder Attacker NameDoBCondition Alice1990??? Bob1985??? Charlie1974??? David1962??? Eve1953??? Fen1941??? External Data t1t1 A flu B col C can D E hac F tac t2t2 flu can col can hac tac t3t3 col flu can hac tac t4t4 col can flu can hac tac … … … … … … … t 35 can flu col tac hac can t 36 can col flu tac hac can Attacker knows:  generalization  external data  privacy property Attacker knows:  generalization  external data  privacy property


Download ppt "Center for Secure Information Systems Concordia Institute for Information Systems Engineering k-Jump Strategy for Preserving Privacy in Micro-Data Disclosure."

Similar presentations


Ads by Google