1 Global Privacy Guarantee in Serial Data Publishing Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jia Liu 2, Ke Wang 3, Yabo Xu 4 The Hong Kong University.

Slides:



Advertisements
Similar presentations
1 Competitive Privacy: Secure Analysis on Integrated Sequence Data Raymond Chi-Wing Wong 1, Eric Lo 2 The Hong Kong University of Science and Technology.
Advertisements

Anonymity for Continuous Data Publishing
Jianxin Li, Chengfei Liu, Rui Zhou Swinburne University of Technology, Australia Wei Wang University of New South Wales, Australia Top-k Keyword Search.
Simulatability “The enemy knows the system”, Claude Shannon CompSci Instructor: Ashwin Machanavajjhala 1Lecture 6 : Fall 12.
Minimality Attack in Privacy Preserving Data Publishing Raymond Chi-Wing Wong (the Chinese University of Hong Kong) Ada Wai-Chee Fu (the Chinese University.
Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.
Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07 Probabilistic Privacy Analysis of Published Views Hui (Wendy) Wang Laks V.S.
M-Invariance: Towards Privacy Preserving Re-publication of Dynamic Datasets by Tyrone Cadenhead.
1 Finding Shortest Paths on Terrains by Killing Two Birds with One Stone Manohar Kaul (Aarhus University) Raymond Chi-Wing Wong (Hong Kong University of.
M-Invariance and Dynamic Datasets based on: Xiaokui Xiao, Yufei Tao m-Invariance: Towards Privacy Preserving Re-publication of Dynamic Datasets Slawomir.
Personalized Privacy Preservation Xiaokui Xiao, Yufei Tao City University of Hong Kong.
Learning on Probabilistic Labels Peng Peng, Raymond Chi-wing Wong, Philip S. Yu CSE, HKUST 1.
Frequent Subgraph Pattern Mining on Uncertain Graph Data
Anatomy: Simple and Effective Privacy Preservation Xiaokui Xiao, Yufei Tao Chinese University of Hong Kong.
Privacy Preserving Data Publication Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong.
Project topics – Private data management Nov
Systems Engineering and Engineering Management The Chinese University of Hong Kong Parameter Free Bursty Events Detection in Text Streams Gabriel Pui Cheong.
Rakesh Agrawal Ramakrishnan Srikant
1 The RSA Algorithm Supplementary Notes Prepared by Raymond Wong Presented by Raymond Wong.
Stabbing the Sky: Efficient Skyline Computation over Sliding Windows COMP9314 Lecture Notes.
1 Basic Counting Supplementary Notes Prepared by Raymond Wong Presented by Raymond Wong.
Probabilistic Inference Protection on Anonymized Data
Privacy Preserving Serial Data Publishing By Role Composition Yingyi Bu 1, Ada Wai-Chee Fu 1, Raymond Chi-Wing Wong 2, Lei Chen 2, Jiuyong Li 3 The Chinese.
On Efficient Spatial Matching Raymond Chi-Wing Wong (the Chinese University of Hong Kong) Yufei Tao (the Chinese University of Hong Kong) Ada Wai-Chee.
1 Efficient Method for Maximizing Bichromatic Reverse Nearest Neighbor Raymond Chi-Wing Wong (Hong Kong University of Science and Technology) M. Tamer.
Efficient Skyline Querying with Variable User Preferences on Nominal Attributes Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jian Pei 3, Yip Sing Ho 2,
1 Mining Favorable Facets Raymond Chi-Wing Wong (the Chinese University of Hong Kong) Jian Pei (Simon Fraser University) Ada Wai-Chee Fu (the Chinese University.
1 Intro to Induction Supplementary Notes Prepared by Raymond Wong Presented by Raymond Wong.
1 Inference Supplementary Notes Prepared by Raymond Wong Presented by Raymond Wong.
1 Solutions to Recurrences Supplementary Notes Prepared by Raymond Wong Presented by Raymond Wong.
L-Diversity: Privacy Beyond K-Anonymity
The Union-Split Algorithm and Cluster-Based Anonymization of Social Networks Brian Thompson Danfeng Yao Rutgers University Dept. of Computer Science Piscataway,
PRIVACY CRITERIA. Roadmap Privacy in Data mining Mobile privacy (k-e) – anonymity (c-k) – safety Privacy skyline.
1 Dot Plots For Time Series Analysis Dragomir Yankov, Eamonn Keogh, Stefano Lonardi Dept. of Computer Science & Eng. University of California Riverside.
Structure based Data De-anonymization of Social Networks and Mobility Traces Shouling Ji, Weiqing Li, and Raheem Beyah Georgia Institute of Technology.
Selective Sampling on Probabilistic Labels Peng Peng, Raymond Chi-Wing Wong CSE, HKUST 1.
UT DALLAS Erik Jonsson School of Engineering & Computer Science FEARLESS engineering Security and Privacy in Social Networks Raymond Heatherly Data Security.
1 Efficient Algorithms for Optimal Location Queries in Road Networks Zitong Chen (Sun Yat-Sen University) Yubao Liu (Sun Yat-Sen University) Raymond Chi-Wing.
Differentially Private Transit Data Publication: A Case Study on the Montreal Transportation System Rui Chen, Concordia University Benjamin C. M. Fung,
Deadline-sensitive Opportunistic Utility-based Routing in Cyclic Mobile Social Networks Mingjun Xiao a, Jie Wu b, He Huang c, Liusheng Huang a, and Wei.
Fast and Exact Monitoring of Co-evolving Data Streams Yasuko Matsubara, Yasushi Sakurai (Kumamoto University) Naonori Ueda (NTT) Masatoshi Yoshikawa (Kyoto.
Approximate Frequency Counts over Data Streams Loo Kin Kong 4 th Oct., 2002.
Data Publishing against Realistic Adversaries Johannes Gerhrke Cornell University Ithaca, NY Michaela Götz Cornell University Ithaca, NY Ashwin Machanavajjhala.
Thwarting Passive Privacy Attacks in Collaborative Filtering Rui Chen Min Xie Laks V.S. Lakshmanan HKBU, Hong Kong UBC, Canada UBC, Canada Introduction.
1 Exact Top-k Nearest Keyword Search in Large Networks Minhao Jiang†, Ada Wai-Chee Fu‡, Raymond Chi-Wing Wong† † The Hong Kong University of Science and.
Background Knowledge Attack for Generalization based Privacy- Preserving Data Mining.
Refined privacy models
SFU Pushing Sensitive Transactions for Itemset Utility (IEEE ICDM 2008) Presenter: Yabo, Xu Authors: Yabo Xu, Benjam C.M. Fung, Ke Wang, Ada. W.C. Fu,
ICML2004, Banff, Alberta, Canada Learning Larger Margin Machine Locally and Globally Kaizhu Huang Haiqin Yang, Irwin King, Michael.
1 Mortality Compression and Longevity Risk Jack C. Yue National Chengchi Univ. Sept. 26, 2009.
Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang.
Hybrid l-Diversity* Mehmet Ercan NergizMuhammed Zahit GökUfuk Özkanlı
Preservation of Proximity Privacy in Publishing Numerical Sensitive Data J. Li, Y. Tao, and X. Xiao SIGMOD 08 Presented by Hongwei Tian.
Mining Favorable Facets Raymond Chi-Wing Wong (the Chinese University of Hong Kong) Jian Pei (Simon Fraser University) Ada Wai-Chee Fu (the Chinese University.
QED: A Novel Quaternary Encoding to Completely Avoid Re-labeling in XML Updates Changqing Li,Tok Wang Ling.
1 On Optimal Worst-Case Matching Cheng Long (Hong Kong University of Science and Technology) Raymond Chi-Wing Wong (Hong Kong University of Science and.
Privacy-preserving data publishing
1/3/ A Framework for Privacy- Preserving Cluster Analysis IEEE ISI 2008 Benjamin C. M. Fung Concordia University Canada Lingyu.
Thesis Sumathie Sundaresan Advisor: Dr. Huiping Guo.
1 Finding Competitive Price Yu Peng (Hong Kong University of Science and Technology) Raymond Chi-Wing Wong (Hong Kong University of Science and Technology)
Trajectory Simplification: On Minimizing the Direction-based Error
Efficient Processing of Updates in Dynamic XML Data Changqing Li, Tok Wang Ling, Min Hu.
Yang, et al. Differentially Private Data Publication and Analysis. Tutorial at SIGMOD’12 Part 4: Data Dependent Query Processing Methods Yin “David” Yang.
Reuse or Never Reuse the Deleted Labels in XML Query Processing Based on Labeling Schemes Changqing Li, Tok Wang Ling, Min Hu.
Output Perturbation with Query Relaxation By: XIAO Xiaokui and TAO Yufei Presenter: CUI Yingjie.
The Chinese University of Hong Kong Learning Larger Margin Machine Locally and Globally Dept. of Computer Science and Engineering The Chinese University.
Xiaokui Xiao and Yufei Tao Chinese University of Hong Kong
Personalized Privacy Protection in Social Networks
Differential Privacy in Practice
Personalized Privacy Protection in Social Networks
Presentation transcript:

1 Global Privacy Guarantee in Serial Data Publishing Raymond Chi-Wing Wong 1, Ada Wai-Chee Fu 2, Jia Liu 2, Ke Wang 3, Yabo Xu 4 The Hong Kong University of Science and Technology 1 The Chinese University of Hong Kong 2 Simon Fraser University 3 Sun Yat-sen University 4 Prepared by Raymond Chi-Wing Wong Presented by Raymond Chi-Wing Wong

2 Outline 1.Sequential Releases 2.Related Work 3.Our Proposed Privacy Model Local Guarantee 4.Conclusion

3 1. Sequential Releases Hospital NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data Public Time = 1 Release the data set to public NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Published Data This table satisfies some privacy requirements (e.g., m-invariance)

4 1. Sequential Releases Hospital NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data Public Time = 1 NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Published Data Time = 2 Release the data set to public Hospital NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data Public NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Published Data This table satisfies some privacy requirements (e.g., m-invariance) Insertions, deletions and updates

5 1. Sequential Releases Hospital NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data Public Time = 1 NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Published Data Time = 2 Hospital NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data Public NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Published Data Time = 3 Hospital NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data Public NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Published Data This table satisfies some privacy requirements (e.g., m-invariance) Insertions, deletions and updates

6 1. Sequential Releases Hospital NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data Public Time = 1 NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Published Data Time = 2 Hospital NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data Public NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Published Data Time = 3 Hospital NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data Public NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Published Data Problem: At the current time t, we want to generate a table which satisfies some privacy requirements (e.g., m-invariance) with respect to all published tables at any time <= t

7 1. Sequential Releases Hospital NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data Public Time = 1 NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Published Data Time = 2 Hospital NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data Public NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Published Data Time = 3 Hospital NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data Public NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Published Data Problem: At the current time t, we want to generate a table which satisfies some privacy requirements (e.g., m-invariance) with respect to all published tables at any time <= t NameSexZipcodeDisease RaymondM65001flu PeterM65002chlamydia MaryF65014flu AliceF65015fever Medical Data Privacy Requirement: Peter would not want anyone to deduce with high confidence from these published data that he has ever contracted chlamydia in the past. A sexually transmitted disease (STD) one or more published dataset

8 1. Sequential Releases Hospital NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data Public Time = 1 NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Published Data Time = 2 Hospital NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data Public NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Published Data Time = 3 Hospital NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data Public NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Published Data Problem: At the current time t, we want to generate a table which satisfies some privacy requirements (e.g., m-invariance) with respect to all published tables at any time <= t NameSexZipcodeDisease RaymondM65001flu PeterM65002chlamydia MaryF65014flu AliceF65015fever Medical Data Privacy Requirement: Peter would not want anyone to deduce with high confidence from these published data that he has ever contracted chlamydia in the past. A sexually transmitted disease (STD) Privacy Requirement: Probability that Peter is linked to chlamydia in one or more published dataset is at most a given threshold (e.g., 1/2). Global Guarantee

9 1. Sequential Releases This global guarantee requirement seems to be quite “obvious” and “natural” No existing works consider this global guarantee requirement Instead, they consider another requirement called local guarantee. Problem: At the current time t, we want to generate a table which satisfies some privacy requirements (e.g., m-invariance) with respect to all published tables at any time <= t Privacy Requirement: Peter would not want anyone to deduce with high confidence from these released data that he has ever contracted chlamydia in the past. Privacy Requirement: Probability that Peter is linked to chlamydia in one or more published dataset is at most a given threshold (e.g., 1/2). Global Guarantee

10 1. Sequential Releases Hospital NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data Public Time = 1 NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Published Data Time = 2 Hospital NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data Public NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Published Data Time = 3 Hospital NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data Public NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Published Data NameSexZipcodeDisease RaymondM65001flu PeterM65002chlamydia MaryF65014flu AliceF65015fever Medical Data A sexually transmitted disease (STD) Privacy Requirement: Probability that Peter is linked to chlamydia in each published dataset is at most a given threshold (e.g., 1/2). Local Guarantee Probability that Peter is linked to chlamydia in the dataset at time = 1 is at most a given threshold (e.g., 1/2). Probability that Peter is linked to chlamydia in the dataset at time = 2 is at most a given threshold (e.g., 1/2). Probability that Peter is linked to chlamydia in the dataset at time = 3 is at most a given threshold (e.g., 1/2).

11 2. Related Work Local Guarantee m-invariance Xiao et al, “m-invariance: Towards Privacy Preserving Re- publication of Dynamic Datasets”, SIGMOD, 2007 l-scarcity Bu et al, “Privacy Preserving Serial Data Publishing by Role Composition”, VLDB, 2008

12 Contribution We are the first to propose the global guarantee requirement We prove that global guarantee is a stronger requirement than local guarantee

13 How can we calculate the probability? According to the published datasets, we derive a formula based on the possible world analysis We skip the details. Problem: At the current time t, we want to generate a table which satisfies some privacy requirements (e.g., m-invariance) with respect to all published tables at any time <= t Privacy Requirement: Peter would not want anyone to deduce with high confidence from these released data that he has ever contracted chlamydia in the past. Privacy Requirement: Probability that Peter is linked to chlamydia in one or more published dataset is at most a given threshold (e.g., 1/2). Global Guarantee

14 Hospital NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data Public Time = 1 NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Published Data Time = 2 Hospital NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data Public NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Published Data Time = 3 Hospital NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Medical Data Public NamePIDDisease Raymondp1p1 Flu Peterp2p2 HIV Maryp3p3 Fever Alicep4p4 HIV Bobp5p5 Flu Johnp6p6 Fever Published Data

15 Property Theorem: Global guarantee is a stronger privacy requirement than local guarantee. If the published tables satisfy global guarantee, then they satisfy local guarantee.

16 Our Algorithm How can we generate tables such that they satisfy global guarantee? Idea: Large group size

17 5. Conclusion We are the first to propose global guarantee Global guarantee is a stronger privacy requirement than local guarantee.

18 Q&A

19 In the following, I will elaborate two concepts. Local Guarantee (e.g., m-invariance) Global Guarantee

20 Public Hospital NameSexZipcodeDisease RaymondM65001flu PeterM65002chlamydia MaryF65014flu AliceF65015fever Medical Data Time = 1 SexZipcodeDisease M65001flu M65002chlamydia F65014flu F65015fever Published Data Voter Registration List NameSexZipcode RaymondM65001 PeterM65002 MaryF65014 AliceF65015 EmilyF65010 Release the data set to public

21 Public Hospital NameSexZipcodeDisease RaymondM65001flu PeterM65002chlamydia MaryF65014flu AliceF65015fever Medical Data Time = 1 SexZipcodeDisease M65001flu M65002chlamydia F65014flu F65015fever Published Data Voter Registration List NameSexZipcode RaymondM65001 PeterM65002 MaryF65014 AliceF65015 EmilyF65010 Release the data set to public

22 Public Hospital NameSexZipcodeDisease RaymondM65001flu PeterM65002chlamydia MaryF65014flu AliceF65015fever Medical Data Time = 1 SexZipcodeDisease M6500*flu M6500*chlamydia F6501*flu F6501*fever Published Data Voter Registration List NameSexZipcode RaymondM65001 PeterM65002 MaryF65014 AliceF65015 EmilyF65010 Release the data set to public Generalization Each individual is linked to “ chlamydia ” with probability at most 1/2 in THIS PUBLISHED TABLE 2-diversity only focuses on ONE-TIME publishing 2-invariance focuses on MULTIPLE-TIME publishing It also makes use of the idea of 2-diversity Idea: Each individual is linked to “ chlamydia ” with probability at most 1/2 for each of the MULTIPLE PUBLISHED TABLES

23 Public Hospital NameSexZipcodeDisease RaymondM65001flu PeterM65002chlamydia MaryF65014flu AliceF65015fever Medical Data Time = 1 SexZipcodeDisease M6500*flu M6500*chlamydia F6501*flu F6501*fever Published Data Voter Registration List NameSexZipcode RaymondM65001 PeterM65002 MaryF65014 AliceF65015 EmilyF65010 Release the data set to public NameSignature Raymond Peter Mary Alice {flu, chlamydia} Raymond Peter Mary Alice {flu, chlamydia} {flu, fever} 2-invariance

24 Public Hospital NameSexZipcodeDisease RaymondM65001flu PeterM65002chlamydia MaryF65014flu AliceF65015fever Medical Data Time = 1 SexZipcodeDisease M6500*flu M6500*chlamydia F6501*flu F6501*fever Published Data Voter Registration List NameSexZipcode RaymondM65001 PeterM65002 MaryF65014 AliceF65015 EmilyF65010 Release the data set to public NameSignature Raymond Peter Mary Alice {flu, chlamydia} {flu, fever} 2-invariance

25 Public Hospital NameSexZipcodeDisease RaymondM65001flu PeterM65002chlamydia MaryF65014flu AliceF65015fever Medical Data Time = 1 SexZipcodeDisease M6500*flu M6500*chlamydia F6501*flu F6501*fever Published Data Voter Registration List NameSexZipcode RaymondM65001 PeterM65002 MaryF65014 AliceF65015 EmilyF65010 Release the data set to public NameSignature Raymond Peter Mary Alice {flu, chlamydia} {flu, fever} 2-invariance

26 Public Hospital NameSexZipcodeDisease RaymondM65001flu PeterM65002chlamydia MaryF65014flu AliceF65015fever Medical Data Time = 1 SexZipcodeDisease M6500*flu M6500*chlamydia F6501*flu F6501*fever Published Data NameSexZipcode RaymondM65001 PeterM65002 MaryF65014 AliceF65015 EmilyF65010 Release the data set to public NameSignature Raymond Peter Mary Alice {flu, chlamydia} {flu, fever} Voter Registration List 2-invariance

27 Public Hospital NameSexZipcodeDisease RaymondM65001flu PeterM65002chlamydia MaryF65014flu AliceF65015fever Medical Data Time = 1 SexZipcodeDisease M6500*flu M6500*chlamydia F6501*flu F6501*fever Published Data NameSexZipcode RaymondM65001 PeterM65002 MaryF65014 AliceF65015 EmilyF65010 Release the data set to public NameSignature Raymond Peter Mary Alice {flu, chlamydia} {flu, fever} Voter Registration List Time = 2 Hospital NameSexZipcodeDisease RaymondM65001chlamydia PeterM65002flu MaryF65014fever EmilyF65010flu Medical Data Release the data set to public SexZipcodeDisease M6500*chlamydia M6500*flu F6501*fever F6501*flu Published Data Raymond Peter Mary Emily 2-invariance

28 Public Hospital NameSexZipcodeDisease RaymondM65001flu PeterM65002chlamydia MaryF65014flu AliceF65015fever Medical Data Time = 1 SexZipcodeDisease M6500*flu M6500*chlamydia F6501*flu F6501*fever Published Data NameSexZipcode RaymondM65001 PeterM65002 MaryF65014 AliceF65015 EmilyF65010 Release the data set to public NameSignature Raymond Peter Mary Alice {flu, chlamydia} {flu, fever} Voter Registration List Time = 2 Hospital NameSexZipcodeDisease RaymondM65001chlamydia PeterM65002flu MaryF65014fever EmilyF65010flu Medical Data Release the data set to public SexZipcodeDisease M6500*chlamydia M6500*flu F6501*fever F6501*flu Published Data Raymond Peter Mary Emily NameSignature Raymond Peter Mary Emily {flu, chlamydia} {flu, fever} This table satisfies 2-invariance. This is because each individual is linked to the SAME signature. Idea of 2-invariance: Each individual is linked to the SAME signature in each published table. 2-invariance

29 Public Hospital NameSexZipcodeDisease RaymondM65001flu PeterM65002chlamydia MaryF65014flu AliceF65015fever Medical Data Time = 1 SexZipcodeDisease M6500*flu M6500*chlamydia F6501*flu F6501*fever Published Data NameSexZipcode RaymondM65001 PeterM65002 MaryF65014 AliceF65015 EmilyF65010 Release the data set to public NameSignature Raymond Peter Mary Alice {flu, chlamydia} {flu, fever} Voter Registration List Time = 2 Hospital NameSexZipcodeDisease RaymondM65001Chlamydia PeterM65002flu MaryF65014fever EmilyF65010flu Medical Data Release the data set to public SexZipcodeDisease M6500*chlamydia M6500*flu F6501*fever F6501*flu Published Data NameSignature Raymond Peter Mary Emily {flu, chlamydia} {flu, fever} 2-invariance

30 Public Time = 1 SexZipcodeDisease M6500*flu M6500*chlamydia F6501*flu F6501*fever Published Data NameSexZipcode RaymondM65001 PeterM65002 MaryF65014 AliceF65015 EmilyF65010 Voter Registration List Time = 2 SexZipcodeDisease M6500*chlamydia M6500*flu F6501*fever F6501*flu Published Data 2-invariance 2-invariance provides the local guarantee. Probability that an individual is linked to chlamydia in each of the published datasets is at most 1/2. Why? Possible World Analysis

31 Public Time = 1 SexZipcodeDisease M6500*flu M6500*chlamydia Published Data NameSexZipcode RaymondM65001 PeterM65002 MaryF65014 AliceF65015 EmilyF65010 Voter Registration List Time = 2 SexZipcodeDisease M6500*chlamydia M6500*flu Published Data 2-invariance 2-invariance provides the local guarantee. Probability that an individual is linked to chlamydia in each of the published datasets is at most 1/2. Why? Possible World Analysis

32 Public Time = 1 SexZipcodeDisease M6500*flu M6500*chlamydia Published Data NameSexZipcode RaymondM65001 PeterM65002 MaryF65014 AliceF65015 EmilyF65010 Voter Registration List Time = 2 SexZipcodeDisease M6500*chlamydia M6500*flu Published Data 2-invariance 2-invariance provides the local guarantee. Probability that an individual is linked to chlamydia in each of the published datasets is at most 1/2. Why? Possible World Analysis SexZipcodeDisease M65001flu M65002chlamydia SexZipcodeDisease M65001chlamydia M65002flu This is the possible world analysis based on the published table at time = 1 only.

33 Public Time = 1 SexZipcodeDisease M6500*flu M6500*chlamydia Published Data NameSexZipcode RaymondM65001 PeterM65002 MaryF65014 AliceF65015 EmilyF65010 Voter Registration List Time = 2 SexZipcodeDisease M6500*chlamydia M6500*flu Published Data 2-invariance 2-invariance provides the local guarantee. Probability that an individual is linked to chlamydia in each of the published datasets is at most 1/2. Why? Possible World Analysis SexZipcodeDisease M65001flu M65002chlamydia SexZipcodeDisease M65001chlamydia M65002flu SexZipcodeDisease M65001flu M65002chlamydia SexZipcodeDisease M65001chlamydia M65002flu This is the possible world analysis based on the published table at time = 2 only.

34 Public Time = 1 SexZipcodeDisease M6500*flu M6500*chlamydia Published Data NameSexZipcode RaymondM65001 PeterM65002 MaryF65014 AliceF65015 EmilyF65010 Voter Registration List Time = 2 SexZipcodeDisease M6500*chlamydia M6500*flu Published Data 2-invariance 2-invariance provides the local guarantee. Probability that an individual is linked to chlamydia in each of the published datasets is at most 1/2. Why? Possible World Analysis SexZipcodeDisease M65001flu M65002chlamydia SexZipcodeDisease M65001chlamydia M65002flu SexZipcodeDisease M65001flu M65002chlamydia SexZipcodeDisease M65001chlamydia M65002flu World 1 SexZipcodeDisease M65001flu M65002chlamydia SexZipcodeDisease M65001chlamydia M65002flu SexZipcodeDisease M65001flu M65002chlamydia SexZipcodeDisease M65001chlamydia M65002flu World 2 World 3 World 4

35 Public Time = 1 SexZipcodeDisease M6500*flu M6500*chlamydia Published Data NameSexZipcode RaymondM65001 PeterM65002 MaryF65014 AliceF65015 EmilyF65010 Voter Registration List Time = 2 SexZipcodeDisease M6500*chlamydia M6500*flu Published Data 2-invariance 2-invariance provides the local guarantee. Probability that an individual is linked to chlamydia in each of the published datasets is at most 1/2. Why? Possible World Analysis SexZipcodeDisease M65001flu M65002chlamydia SexZipcodeDisease M65001chlamydia M65002flu SexZipcodeDisease M65001flu M65002chlamydia SexZipcodeDisease M65001chlamydia M65002flu World 1 SexZipcodeDisease M65001flu M65002chlamydia SexZipcodeDisease M65001chlamydia M65002flu SexZipcodeDisease M65001flu M65002chlamydia SexZipcodeDisease M65001chlamydia M65002flu World 2 World 3 World 4 In the published data at time = 1, Prob(the second individual (i.e. Peter) is linked to chlamydia) = 2/4 = 1/2 Yes No

36 Public Time = 1 SexZipcodeDisease M6500*flu M6500*chlamydia Published Data NameSexZipcode RaymondM65001 PeterM65002 MaryF65014 AliceF65015 EmilyF65010 Voter Registration List Time = 2 SexZipcodeDisease M6500*chlamydia M6500*flu Published Data 2-invariance 2-invariance provides the local guarantee. Probability that an individual is linked to chlamydia in each of the published datasets is at most 1/2. Why? Possible World Analysis SexZipcodeDisease M65001flu M65002chlamydia SexZipcodeDisease M65001chlamydia M65002flu SexZipcodeDisease M65001flu M65002chlamydia SexZipcodeDisease M65001chlamydia M65002flu World 1 SexZipcodeDisease M65001flu M65002chlamydia SexZipcodeDisease M65001chlamydia M65002flu SexZipcodeDisease M65001flu M65002chlamydia SexZipcodeDisease M65001chlamydia M65002flu World 2 World 3 World 4 In the published data at time = 2, Prob(the second individual (i.e. Peter) is linked to chlamydia) = 2/4 = 1/2 Yes No Yes No

37 Public Time = 1 NameSexZipcode RaymondM65001 PeterM65002 MaryF65014 AliceF65015 EmilyF65010 Voter Registration List Time = 2 2-invariance 2-invariance provides the local guarantee. Probability that an individual is linked to chlamydia in each of the published datasets is at most 1/2. Possible World Analysis SexZipcodeDisease M65001flu M65002chlamydia SexZipcodeDisease M65001chlamydia M65002flu SexZipcodeDisease M65001flu M65002chlamydia SexZipcodeDisease M65001chlamydia M65002flu World 1 SexZipcodeDisease M65001flu M65002chlamydia SexZipcodeDisease M65001chlamydia M65002flu SexZipcodeDisease M65001flu M65002chlamydia SexZipcodeDisease M65001chlamydia M65002flu World 2 World 3 World 4 Global Guarantee: Probability that an individual is linked to chlamydia in one or more published dataset is at most 1/2. Prob(the second individual (i.e. Peter) is linked to chlamydia in one or more published dataset) =

38 Public Time = 1 NameSexZipcode RaymondM65001 PeterM65002 MaryF65014 AliceF65015 EmilyF65010 Voter Registration List Time = 2 2-invariance 2-invariance provides the local guarantee. Probability that an individual is linked to chlamydia in each of the published datasets is at most 1/2. Possible World Analysis SexZipcodeDisease M65001flu M65002chlamydia SexZipcodeDisease M65001chlamydia M65002flu SexZipcodeDisease M65001flu M65002chlamydia SexZipcodeDisease M65001chlamydia M65002flu World 1 SexZipcodeDisease M65001flu M65002chlamydia SexZipcodeDisease M65001chlamydia M65002flu SexZipcodeDisease M65001flu M65002chlamydia SexZipcodeDisease M65001chlamydia M65002flu World 2 World 3 World 4 Global Guarantee: Probability that an individual is linked to chlamydia in one or more published dataset is at most 1/2. Prob(the second individual (i.e. Peter) is linked to chlamydia in one or more published dataset) = Yes

39 Public Time = 1 NameSexZipcode RaymondM65001 PeterM65002 MaryF65014 AliceF65015 EmilyF65010 Voter Registration List Time = 2 2-invariance 2-invariance provides the local guarantee. Probability that an individual is linked to chlamydia in each of the published datasets is at most 1/2. Possible World Analysis SexZipcodeDisease M65001flu M65002chlamydia SexZipcodeDisease M65001chlamydia M65002flu SexZipcodeDisease M65001flu M65002chlamydia SexZipcodeDisease M65001chlamydia M65002flu World 1 SexZipcodeDisease M65001flu M65002chlamydia SexZipcodeDisease M65001chlamydia M65002flu SexZipcodeDisease M65001flu M65002chlamydia SexZipcodeDisease M65001chlamydia M65002flu World 2 World 3 World 4 Global Guarantee: Probability that an individual is linked to chlamydia in one or more published dataset is at most 1/2. Prob(the second individual (i.e. Peter) is linked to chlamydia in one or more published dataset) = Yes

40 Public Time = 1 NameSexZipcode RaymondM65001 PeterM65002 MaryF65014 AliceF65015 EmilyF65010 Voter Registration List Time = 2 2-invariance 2-invariance provides the local guarantee. Probability that an individual is linked to chlamydia in each of the published datasets is at most 1/2. Possible World Analysis SexZipcodeDisease M65001flu M65002chlamydia SexZipcodeDisease M65001chlamydia M65002flu SexZipcodeDisease M65001flu M65002chlamydia SexZipcodeDisease M65001chlamydia M65002flu World 1 SexZipcodeDisease M65001flu M65002chlamydia SexZipcodeDisease M65001chlamydia M65002flu SexZipcodeDisease M65001flu M65002chlamydia SexZipcodeDisease M65001chlamydia M65002flu World 2 World 3 World 4 Global Guarantee: Probability that an individual is linked to chlamydia in one or more published dataset is at most 1/2. Prob(the second individual (i.e. Peter) is linked to chlamydia in one or more published dataset) = Yes

41 Public Time = 1 NameSexZipcode RaymondM65001 PeterM65002 MaryF65014 AliceF65015 EmilyF65010 Voter Registration List Time = 2 2-invariance 2-invariance provides the local guarantee. Probability that an individual is linked to chlamydia in each of the published datasets is at most 1/2. Possible World Analysis SexZipcodeDisease M65001flu M65002chlamydia SexZipcodeDisease M65001chlamydia M65002flu SexZipcodeDisease M65001flu M65002chlamydia SexZipcodeDisease M65001chlamydia M65002flu World 1 SexZipcodeDisease M65001flu M65002chlamydia SexZipcodeDisease M65001chlamydia M65002flu SexZipcodeDisease M65001flu M65002chlamydia SexZipcodeDisease M65001chlamydia M65002flu World 2 World 3 World 4 Global Guarantee: Probability that an individual is linked to chlamydia in one or more published dataset is at most 1/2. Prob(the second individual (i.e. Peter) is linked to chlamydia in one or more published dataset) = Yes No 3/4 This value is larger than 1/2.

42 We illustrate how we derive a probabilty that an individual is linked to chlamydia with an example (for both local guarantee and global guarantee). In fact, the general formula is much more complicated.

43 Theorem: Global guarantee is a stronger privacy requirement than local guarantee. If the published tables satisfy global guarantee, then they satisfy local guarantee.

44 How can we generate tables such that they satisfy global guarantee? Idea: Large group size

45 Public Hospital NameSexZipcodeDisease RaymondM65001flu PeterM65002chlamydia MaryF65014flu AliceF65015fever Medical Data Time = 1 SexZipcodeDisease M/F650**flu M/F650**chlamydia M/F650**flu M/F650**fever Published Data Release the data set to public Time = 2 Hospital NameSexZipcodeDisease RaymondM65001flu PeterM65002chlamydia MaryF65014fever EmilyF65010flu Medical Data Release the data set to public SexZipcodeDisease M/F650**flu M/F650**chlamydia M/F650**fever M/F650**flu Published Data Prob(the second individual (i.e. Peter) is linked to chlamydia in one or more published datasets) = 7/16 Global Guarantee This value is smaller than 1/2.

46 5. Conclusion We are the first to propose global guarantee Global guarantee is a stronger privacy requirement than local guarantee.

47 Q&A

48 Public Hospital NameSexZipcodeDisease RaymondM65001flu PeterM65002chlamydia MaryF65014flu AliceF65015fever Medical Data Time = 1 SexZipcodeDisease M6500*flu M6500*chlamydia F6501*flu F6501*fever Published Data Release the data set to public Time = 2 Hospital NameSexZipcodeDisease RaymondM65001flu PeterM65002chlamydia MaryF65014fever EmilyF65010flu Medical Data Release the data set to public SexZipcodeDisease M6500*flu M6500*chlamydia F6501*fever F6501*flu Published Data 2-invariance (Local Guarantee) Prob(the second individual (i.e. Peter) is linked to chlamydia in one or more published dataset) = 3/4 This value is larger than 1/2.