Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Global Privacy Guarantee in Serial Data Publishing Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jia Liu 2, Ke Wang 3, Yabo Xu 4 The Hong Kong University.

Similar presentations


Presentation on theme: "1 Global Privacy Guarantee in Serial Data Publishing Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jia Liu 2, Ke Wang 3, Yabo Xu 4 The Hong Kong University."— Presentation transcript:

1 1 Global Privacy Guarantee in Serial Data Publishing Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jia Liu 2, Ke Wang 3, Yabo Xu 4 The Hong Kong University of Science and Technology 1 The Chinese University of Hong Kong 2 Simon Fraser University 3 Sun Yat-sen University 4 Prepared by Raymond Chi-Wing Wong Presented by Raymond Chi-Wing Wong

2 2 Outline 1.Sequential Releases 2.Related Work 3.Our Proposed Privacy Model Local Guarantee 4.Conclusion

3 3 1. Sequential Releases Hospital NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data Public Time = 1 Release the data set to public NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Published Data This table satisfies some privacy requirements (e.g., m-invariance)

4 4 1. Sequential Releases Hospital NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data Public Time = 1 NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Published Data Time = 2 Release the data set to public Hospital NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data Public NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Published Data This table satisfies some privacy requirements (e.g., m-invariance) Insertions, deletions and updates

5 5 1. Sequential Releases Hospital NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data Public Time = 1 NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Published Data Time = 2 Hospital NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data Public NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Published Data Time = 3 Hospital NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data Public NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Published Data This table satisfies some privacy requirements (e.g., m-invariance) Insertions, deletions and updates

6 6 1. Sequential Releases Hospital NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data Public Time = 1 NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Published Data Time = 2 Hospital NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data Public NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Published Data Time = 3 Hospital NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data Public NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Published Data Problem: At the current time t, we want to generate a table which satisfies some privacy requirements (e.g., m-invariance) with respect to all published tables at any time <= t

7 7 1. Sequential Releases Hospital NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data Public Time = 1 NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Published Data Time = 2 Hospital NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data Public NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Published Data Time = 3 Hospital NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data Public NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Published Data Problem: At the current time t, we want to generate a table which satisfies some privacy requirements (e.g., m-invariance) with respect to all published tables at any time <= t NameSexZipcodeDisease RaymondM65001flu PeterM65002chlamydia MaryF65014flu AliceF65015fever Medical Data Privacy Requirement: Peter would not want anyone to deduce with high confidence from these published data that he has ever contracted chlamydia in the past. A sexually transmitted disease (STD) one or more published dataset

8 8 1. Sequential Releases Hospital NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data Public Time = 1 NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Published Data Time = 2 Hospital NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data Public NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Published Data Time = 3 Hospital NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data Public NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Published Data Problem: At the current time t, we want to generate a table which satisfies some privacy requirements (e.g., m-invariance) with respect to all published tables at any time <= t NameSexZipcodeDisease RaymondM65001flu PeterM65002chlamydia MaryF65014flu AliceF65015fever Medical Data Privacy Requirement: Peter would not want anyone to deduce with high confidence from these published data that he has ever contracted chlamydia in the past. A sexually transmitted disease (STD) Privacy Requirement: Probability that Peter is linked to chlamydia in one or more published dataset is at most a given threshold (e.g., 1/2). Global Guarantee

9 9 1. Sequential Releases This global guarantee requirement seems to be quite “obvious” and “natural” No existing works consider this global guarantee requirement Instead, they consider another requirement called local guarantee. Problem: At the current time t, we want to generate a table which satisfies some privacy requirements (e.g., m-invariance) with respect to all published tables at any time <= t Privacy Requirement: Peter would not want anyone to deduce with high confidence from these released data that he has ever contracted chlamydia in the past. Privacy Requirement: Probability that Peter is linked to chlamydia in one or more published dataset is at most a given threshold (e.g., 1/2). Global Guarantee

10 10 1. Sequential Releases Hospital NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data Public Time = 1 NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Published Data Time = 2 Hospital NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data Public NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Published Data Time = 3 Hospital NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data Public NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Published Data NameSexZipcodeDisease RaymondM65001flu PeterM65002chlamydia MaryF65014flu AliceF65015fever Medical Data A sexually transmitted disease (STD) Privacy Requirement: Probability that Peter is linked to chlamydia in each published dataset is at most a given threshold (e.g., 1/2). Local Guarantee Probability that Peter is linked to chlamydia in the dataset at time = 1 is at most a given threshold (e.g., 1/2). Probability that Peter is linked to chlamydia in the dataset at time = 2 is at most a given threshold (e.g., 1/2). Probability that Peter is linked to chlamydia in the dataset at time = 3 is at most a given threshold (e.g., 1/2).

11 11 2. Related Work Local Guarantee m-invariance Xiao et al, “m-invariance: Towards Privacy Preserving Re- publication of Dynamic Datasets”, SIGMOD, 2007 l-scarcity Bu et al, “Privacy Preserving Serial Data Publishing by Role Composition”, VLDB, 2008

12 12 Contribution We are the first to propose the global guarantee requirement We prove that global guarantee is a stronger requirement than local guarantee

13 13 How can we calculate the probability? According to the published datasets, we derive a formula based on the possible world analysis We skip the details. Problem: At the current time t, we want to generate a table which satisfies some privacy requirements (e.g., m-invariance) with respect to all published tables at any time <= t Privacy Requirement: Peter would not want anyone to deduce with high confidence from these released data that he has ever contracted chlamydia in the past. Privacy Requirement: Probability that Peter is linked to chlamydia in one or more published dataset is at most a given threshold (e.g., 1/2). Global Guarantee

14 14 Hospital NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data Public Time = 1 NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Published Data Time = 2 Hospital NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data Public NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Published Data Time = 3 Hospital NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data Public NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Published Data

15 15 Property Theorem: Global guarantee is a stronger privacy requirement than local guarantee. If the published tables satisfy global guarantee, then they satisfy local guarantee.

16 16 Our Algorithm How can we generate tables such that they satisfy global guarantee? Idea: Large group size

17 17 5. Conclusion We are the first to propose global guarantee Global guarantee is a stronger privacy requirement than local guarantee.

18 18 Q&A

19 19 In the following, I will elaborate two concepts. Local Guarantee (e.g., m-invariance) Global Guarantee

20 20 Public Hospital NameSexZipcodeDisease RaymondM65001flu PeterM65002chlamydia MaryF65014flu AliceF65015fever Medical Data Time = 1 SexZipcodeDisease M65001flu M65002chlamydia F65014flu F65015fever Published Data Voter Registration List NameSexZipcode RaymondM65001 PeterM65002 MaryF65014 AliceF65015 EmilyF65010 Release the data set to public

21 21 Public Hospital NameSexZipcodeDisease RaymondM65001flu PeterM65002chlamydia MaryF65014flu AliceF65015fever Medical Data Time = 1 SexZipcodeDisease M65001flu M65002chlamydia F65014flu F65015fever Published Data Voter Registration List NameSexZipcode RaymondM65001 PeterM65002 MaryF65014 AliceF65015 EmilyF65010 Release the data set to public

22 22 Public Hospital NameSexZipcodeDisease RaymondM65001flu PeterM65002chlamydia MaryF65014flu AliceF65015fever Medical Data Time = 1 SexZipcodeDisease M6500*flu M6500*chlamydia F6501*flu F6501*fever Published Data Voter Registration List NameSexZipcode RaymondM65001 PeterM65002 MaryF65014 AliceF65015 EmilyF65010 Release the data set to public Generalization Each individual is linked to “ chlamydia ” with probability at most 1/2 in THIS PUBLISHED TABLE 2-diversity only focuses on ONE-TIME publishing 2-invariance focuses on MULTIPLE-TIME publishing It also makes use of the idea of 2-diversity Idea: Each individual is linked to “ chlamydia ” with probability at most 1/2 for each of the MULTIPLE PUBLISHED TABLES

23 23 Public Hospital NameSexZipcodeDisease RaymondM65001flu PeterM65002chlamydia MaryF65014flu AliceF65015fever Medical Data Time = 1 SexZipcodeDisease M6500*flu M6500*chlamydia F6501*flu F6501*fever Published Data Voter Registration List NameSexZipcode RaymondM65001 PeterM65002 MaryF65014 AliceF65015 EmilyF65010 Release the data set to public NameSignature Raymond Peter Mary Alice {flu, chlamydia} Raymond Peter Mary Alice {flu, chlamydia} {flu, fever} 2-invariance

24 24 Public Hospital NameSexZipcodeDisease RaymondM65001flu PeterM65002chlamydia MaryF65014flu AliceF65015fever Medical Data Time = 1 SexZipcodeDisease M6500*flu M6500*chlamydia F6501*flu F6501*fever Published Data Voter Registration List NameSexZipcode RaymondM65001 PeterM65002 MaryF65014 AliceF65015 EmilyF65010 Release the data set to public NameSignature Raymond Peter Mary Alice {flu, chlamydia} {flu, fever} 2-invariance

25 25 Public Hospital NameSexZipcodeDisease RaymondM65001flu PeterM65002chlamydia MaryF65014flu AliceF65015fever Medical Data Time = 1 SexZipcodeDisease M6500*flu M6500*chlamydia F6501*flu F6501*fever Published Data Voter Registration List NameSexZipcode RaymondM65001 PeterM65002 MaryF65014 AliceF65015 EmilyF65010 Release the data set to public NameSignature Raymond Peter Mary Alice {flu, chlamydia} {flu, fever} 2-invariance

26 26 Public Hospital NameSexZipcodeDisease RaymondM65001flu PeterM65002chlamydia MaryF65014flu AliceF65015fever Medical Data Time = 1 SexZipcodeDisease M6500*flu M6500*chlamydia F6501*flu F6501*fever Published Data NameSexZipcode RaymondM65001 PeterM65002 MaryF65014 AliceF65015 EmilyF65010 Release the data set to public NameSignature Raymond Peter Mary Alice {flu, chlamydia} {flu, fever} Voter Registration List 2-invariance

27 27 Public Hospital NameSexZipcodeDisease RaymondM65001flu PeterM65002chlamydia MaryF65014flu AliceF65015fever Medical Data Time = 1 SexZipcodeDisease M6500*flu M6500*chlamydia F6501*flu F6501*fever Published Data NameSexZipcode RaymondM65001 PeterM65002 MaryF65014 AliceF65015 EmilyF65010 Release the data set to public NameSignature Raymond Peter Mary Alice {flu, chlamydia} {flu, fever} Voter Registration List Time = 2 Hospital NameSexZipcodeDisease RaymondM65001chlamydia PeterM65002flu MaryF65014fever EmilyF65010flu Medical Data Release the data set to public SexZipcodeDisease M6500*chlamydia M6500*flu F6501*fever F6501*flu Published Data Raymond Peter Mary Emily 2-invariance

28 28 Public Hospital NameSexZipcodeDisease RaymondM65001flu PeterM65002chlamydia MaryF65014flu AliceF65015fever Medical Data Time = 1 SexZipcodeDisease M6500*flu M6500*chlamydia F6501*flu F6501*fever Published Data NameSexZipcode RaymondM65001 PeterM65002 MaryF65014 AliceF65015 EmilyF65010 Release the data set to public NameSignature Raymond Peter Mary Alice {flu, chlamydia} {flu, fever} Voter Registration List Time = 2 Hospital NameSexZipcodeDisease RaymondM65001chlamydia PeterM65002flu MaryF65014fever EmilyF65010flu Medical Data Release the data set to public SexZipcodeDisease M6500*chlamydia M6500*flu F6501*fever F6501*flu Published Data Raymond Peter Mary Emily NameSignature Raymond Peter Mary Emily {flu, chlamydia} {flu, fever} This table satisfies 2-invariance. This is because each individual is linked to the SAME signature. Idea of 2-invariance: Each individual is linked to the SAME signature in each published table. 2-invariance

29 29 Public Hospital NameSexZipcodeDisease RaymondM65001flu PeterM65002chlamydia MaryF65014flu AliceF65015fever Medical Data Time = 1 SexZipcodeDisease M6500*flu M6500*chlamydia F6501*flu F6501*fever Published Data NameSexZipcode RaymondM65001 PeterM65002 MaryF65014 AliceF65015 EmilyF65010 Release the data set to public NameSignature Raymond Peter Mary Alice {flu, chlamydia} {flu, fever} Voter Registration List Time = 2 Hospital NameSexZipcodeDisease RaymondM65001Chlamydia PeterM65002flu MaryF65014fever EmilyF65010flu Medical Data Release the data set to public SexZipcodeDisease M6500*chlamydia M6500*flu F6501*fever F6501*flu Published Data NameSignature Raymond Peter Mary Emily {flu, chlamydia} {flu, fever} 2-invariance

30 30 Public Time = 1 SexZipcodeDisease M6500*flu M6500*chlamydia F6501*flu F6501*fever Published Data NameSexZipcode RaymondM65001 PeterM65002 MaryF65014 AliceF65015 EmilyF65010 Voter Registration List Time = 2 SexZipcodeDisease M6500*chlamydia M6500*flu F6501*fever F6501*flu Published Data 2-invariance 2-invariance provides the local guarantee. Probability that an individual is linked to chlamydia in each of the published datasets is at most 1/2. Why? Possible World Analysis

31 31 Public Time = 1 SexZipcodeDisease M6500*flu M6500*chlamydia Published Data NameSexZipcode RaymondM65001 PeterM65002 MaryF65014 AliceF65015 EmilyF65010 Voter Registration List Time = 2 SexZipcodeDisease M6500*chlamydia M6500*flu Published Data 2-invariance 2-invariance provides the local guarantee. Probability that an individual is linked to chlamydia in each of the published datasets is at most 1/2. Why? Possible World Analysis

32 32 Public Time = 1 SexZipcodeDisease M6500*flu M6500*chlamydia Published Data NameSexZipcode RaymondM65001 PeterM65002 MaryF65014 AliceF65015 EmilyF65010 Voter Registration List Time = 2 SexZipcodeDisease M6500*chlamydia M6500*flu Published Data 2-invariance 2-invariance provides the local guarantee. Probability that an individual is linked to chlamydia in each of the published datasets is at most 1/2. Why? Possible World Analysis SexZipcodeDisease M65001flu M65002chlamydia SexZipcodeDisease M65001chlamydia M65002flu This is the possible world analysis based on the published table at time = 1 only.

33 33 Public Time = 1 SexZipcodeDisease M6500*flu M6500*chlamydia Published Data NameSexZipcode RaymondM65001 PeterM65002 MaryF65014 AliceF65015 EmilyF65010 Voter Registration List Time = 2 SexZipcodeDisease M6500*chlamydia M6500*flu Published Data 2-invariance 2-invariance provides the local guarantee. Probability that an individual is linked to chlamydia in each of the published datasets is at most 1/2. Why? Possible World Analysis SexZipcodeDisease M65001flu M65002chlamydia SexZipcodeDisease M65001chlamydia M65002flu SexZipcodeDisease M65001flu M65002chlamydia SexZipcodeDisease M65001chlamydia M65002flu This is the possible world analysis based on the published table at time = 2 only.

34 34 Public Time = 1 SexZipcodeDisease M6500*flu M6500*chlamydia Published Data NameSexZipcode RaymondM65001 PeterM65002 MaryF65014 AliceF65015 EmilyF65010 Voter Registration List Time = 2 SexZipcodeDisease M6500*chlamydia M6500*flu Published Data 2-invariance 2-invariance provides the local guarantee. Probability that an individual is linked to chlamydia in each of the published datasets is at most 1/2. Why? Possible World Analysis SexZipcodeDisease M65001flu M65002chlamydia SexZipcodeDisease M65001chlamydia M65002flu SexZipcodeDisease M65001flu M65002chlamydia SexZipcodeDisease M65001chlamydia M65002flu World 1 SexZipcodeDisease M65001flu M65002chlamydia SexZipcodeDisease M65001chlamydia M65002flu SexZipcodeDisease M65001flu M65002chlamydia SexZipcodeDisease M65001chlamydia M65002flu World 2 World 3 World 4

35 35 Public Time = 1 SexZipcodeDisease M6500*flu M6500*chlamydia Published Data NameSexZipcode RaymondM65001 PeterM65002 MaryF65014 AliceF65015 EmilyF65010 Voter Registration List Time = 2 SexZipcodeDisease M6500*chlamydia M6500*flu Published Data 2-invariance 2-invariance provides the local guarantee. Probability that an individual is linked to chlamydia in each of the published datasets is at most 1/2. Why? Possible World Analysis SexZipcodeDisease M65001flu M65002chlamydia SexZipcodeDisease M65001chlamydia M65002flu SexZipcodeDisease M65001flu M65002chlamydia SexZipcodeDisease M65001chlamydia M65002flu World 1 SexZipcodeDisease M65001flu M65002chlamydia SexZipcodeDisease M65001chlamydia M65002flu SexZipcodeDisease M65001flu M65002chlamydia SexZipcodeDisease M65001chlamydia M65002flu World 2 World 3 World 4 In the published data at time = 1, Prob(the second individual (i.e. Peter) is linked to chlamydia) = 2/4 = 1/2 Yes No

36 36 Public Time = 1 SexZipcodeDisease M6500*flu M6500*chlamydia Published Data NameSexZipcode RaymondM65001 PeterM65002 MaryF65014 AliceF65015 EmilyF65010 Voter Registration List Time = 2 SexZipcodeDisease M6500*chlamydia M6500*flu Published Data 2-invariance 2-invariance provides the local guarantee. Probability that an individual is linked to chlamydia in each of the published datasets is at most 1/2. Why? Possible World Analysis SexZipcodeDisease M65001flu M65002chlamydia SexZipcodeDisease M65001chlamydia M65002flu SexZipcodeDisease M65001flu M65002chlamydia SexZipcodeDisease M65001chlamydia M65002flu World 1 SexZipcodeDisease M65001flu M65002chlamydia SexZipcodeDisease M65001chlamydia M65002flu SexZipcodeDisease M65001flu M65002chlamydia SexZipcodeDisease M65001chlamydia M65002flu World 2 World 3 World 4 In the published data at time = 2, Prob(the second individual (i.e. Peter) is linked to chlamydia) = 2/4 = 1/2 Yes No Yes No

37 37 Public Time = 1 NameSexZipcode RaymondM65001 PeterM65002 MaryF65014 AliceF65015 EmilyF65010 Voter Registration List Time = 2 2-invariance 2-invariance provides the local guarantee. Probability that an individual is linked to chlamydia in each of the published datasets is at most 1/2. Possible World Analysis SexZipcodeDisease M65001flu M65002chlamydia SexZipcodeDisease M65001chlamydia M65002flu SexZipcodeDisease M65001flu M65002chlamydia SexZipcodeDisease M65001chlamydia M65002flu World 1 SexZipcodeDisease M65001flu M65002chlamydia SexZipcodeDisease M65001chlamydia M65002flu SexZipcodeDisease M65001flu M65002chlamydia SexZipcodeDisease M65001chlamydia M65002flu World 2 World 3 World 4 Global Guarantee: Probability that an individual is linked to chlamydia in one or more published dataset is at most 1/2. Prob(the second individual (i.e. Peter) is linked to chlamydia in one or more published dataset) =

38 38 Public Time = 1 NameSexZipcode RaymondM65001 PeterM65002 MaryF65014 AliceF65015 EmilyF65010 Voter Registration List Time = 2 2-invariance 2-invariance provides the local guarantee. Probability that an individual is linked to chlamydia in each of the published datasets is at most 1/2. Possible World Analysis SexZipcodeDisease M65001flu M65002chlamydia SexZipcodeDisease M65001chlamydia M65002flu SexZipcodeDisease M65001flu M65002chlamydia SexZipcodeDisease M65001chlamydia M65002flu World 1 SexZipcodeDisease M65001flu M65002chlamydia SexZipcodeDisease M65001chlamydia M65002flu SexZipcodeDisease M65001flu M65002chlamydia SexZipcodeDisease M65001chlamydia M65002flu World 2 World 3 World 4 Global Guarantee: Probability that an individual is linked to chlamydia in one or more published dataset is at most 1/2. Prob(the second individual (i.e. Peter) is linked to chlamydia in one or more published dataset) = Yes

39 39 Public Time = 1 NameSexZipcode RaymondM65001 PeterM65002 MaryF65014 AliceF65015 EmilyF65010 Voter Registration List Time = 2 2-invariance 2-invariance provides the local guarantee. Probability that an individual is linked to chlamydia in each of the published datasets is at most 1/2. Possible World Analysis SexZipcodeDisease M65001flu M65002chlamydia SexZipcodeDisease M65001chlamydia M65002flu SexZipcodeDisease M65001flu M65002chlamydia SexZipcodeDisease M65001chlamydia M65002flu World 1 SexZipcodeDisease M65001flu M65002chlamydia SexZipcodeDisease M65001chlamydia M65002flu SexZipcodeDisease M65001flu M65002chlamydia SexZipcodeDisease M65001chlamydia M65002flu World 2 World 3 World 4 Global Guarantee: Probability that an individual is linked to chlamydia in one or more published dataset is at most 1/2. Prob(the second individual (i.e. Peter) is linked to chlamydia in one or more published dataset) = Yes

40 40 Public Time = 1 NameSexZipcode RaymondM65001 PeterM65002 MaryF65014 AliceF65015 EmilyF65010 Voter Registration List Time = 2 2-invariance 2-invariance provides the local guarantee. Probability that an individual is linked to chlamydia in each of the published datasets is at most 1/2. Possible World Analysis SexZipcodeDisease M65001flu M65002chlamydia SexZipcodeDisease M65001chlamydia M65002flu SexZipcodeDisease M65001flu M65002chlamydia SexZipcodeDisease M65001chlamydia M65002flu World 1 SexZipcodeDisease M65001flu M65002chlamydia SexZipcodeDisease M65001chlamydia M65002flu SexZipcodeDisease M65001flu M65002chlamydia SexZipcodeDisease M65001chlamydia M65002flu World 2 World 3 World 4 Global Guarantee: Probability that an individual is linked to chlamydia in one or more published dataset is at most 1/2. Prob(the second individual (i.e. Peter) is linked to chlamydia in one or more published dataset) = Yes

41 41 Public Time = 1 NameSexZipcode RaymondM65001 PeterM65002 MaryF65014 AliceF65015 EmilyF65010 Voter Registration List Time = 2 2-invariance 2-invariance provides the local guarantee. Probability that an individual is linked to chlamydia in each of the published datasets is at most 1/2. Possible World Analysis SexZipcodeDisease M65001flu M65002chlamydia SexZipcodeDisease M65001chlamydia M65002flu SexZipcodeDisease M65001flu M65002chlamydia SexZipcodeDisease M65001chlamydia M65002flu World 1 SexZipcodeDisease M65001flu M65002chlamydia SexZipcodeDisease M65001chlamydia M65002flu SexZipcodeDisease M65001flu M65002chlamydia SexZipcodeDisease M65001chlamydia M65002flu World 2 World 3 World 4 Global Guarantee: Probability that an individual is linked to chlamydia in one or more published dataset is at most 1/2. Prob(the second individual (i.e. Peter) is linked to chlamydia in one or more published dataset) = Yes No 3/4 This value is larger than 1/2.

42 42 We illustrate how we derive a probabilty that an individual is linked to chlamydia with an example (for both local guarantee and global guarantee). In fact, the general formula is much more complicated.

43 43 Theorem: Global guarantee is a stronger privacy requirement than local guarantee. If the published tables satisfy global guarantee, then they satisfy local guarantee.

44 44 How can we generate tables such that they satisfy global guarantee? Idea: Large group size

45 45 Public Hospital NameSexZipcodeDisease RaymondM65001flu PeterM65002chlamydia MaryF65014flu AliceF65015fever Medical Data Time = 1 SexZipcodeDisease M/F650**flu M/F650**chlamydia M/F650**flu M/F650**fever Published Data Release the data set to public Time = 2 Hospital NameSexZipcodeDisease RaymondM65001flu PeterM65002chlamydia MaryF65014fever EmilyF65010flu Medical Data Release the data set to public SexZipcodeDisease M/F650**flu M/F650**chlamydia M/F650**fever M/F650**flu Published Data Prob(the second individual (i.e. Peter) is linked to chlamydia in one or more published datasets) = 7/16 Global Guarantee This value is smaller than 1/2.

46 46 5. Conclusion We are the first to propose global guarantee Global guarantee is a stronger privacy requirement than local guarantee.

47 47 Q&A

48 48 Public Hospital NameSexZipcodeDisease RaymondM65001flu PeterM65002chlamydia MaryF65014flu AliceF65015fever Medical Data Time = 1 SexZipcodeDisease M6500*flu M6500*chlamydia F6501*flu F6501*fever Published Data Release the data set to public Time = 2 Hospital NameSexZipcodeDisease RaymondM65001flu PeterM65002chlamydia MaryF65014fever EmilyF65010flu Medical Data Release the data set to public SexZipcodeDisease M6500*flu M6500*chlamydia F6501*fever F6501*flu Published Data 2-invariance (Local Guarantee) Prob(the second individual (i.e. Peter) is linked to chlamydia in one or more published dataset) = 3/4 This value is larger than 1/2.


Download ppt "1 Global Privacy Guarantee in Serial Data Publishing Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jia Liu 2, Ke Wang 3, Yabo Xu 4 The Hong Kong University."

Similar presentations


Ads by Google