Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07 Probabilistic Privacy Analysis of Published Views Hui (Wendy) Wang Laks V.S. Lakshmanan University of British Columbia Vancouver, Canada
Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07 Introduction Publishing relational data containing personal information Publishing relational data containing personal information Privacy concern: private associations Privacy concern: private associations E.g., Alice gets AIDS E.g., Alice gets AIDS Utility concern: public associations Utility concern: public associations E.g. what are the ages that people are possibly to have heart disease E.g. what are the ages that people are possibly to have heart disease NameAgeJobDisease Sarah50Artist Heart disease Alice30ArtistAIDS John50Artist
Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07 Approach 1: K-anonymity K-anonymity (e.g., [Bayardo05], [LeFevre05]) K-anonymity (e.g., [Bayardo05], [LeFevre05]) NameAgeJobDisease *[30,50]Artist Heart disease *[30,50]ArtistAIDS *[30,50]Artist Guaranteed privacy Guaranteed privacy Compromise utility for privacy Compromise utility for privacy Revisit the example: what are the ages that people are possibly to have heart disease Revisit the example: what are the ages that people are possibly to have heart disease
Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07 Approach 2: View Publishing Publishing Views (e.g., [Yao05], [Miklau04], [Xiao06]) Publishing Views (e.g., [Yao05], [Miklau04], [Xiao06]) NameAge Sarah50 Alice30 John50AgeJobDisease50Artist Heart disease 30ArtistAIDS V1V1V1V1 V2V2V2V2 Guaranteed utility Guaranteed utility Possibility of privacy breach Possibility of privacy breach E.g., V 1 join V 2 Prob(“Alice”, “AIDS”) = 1 E.g., V 1 join V 2 Prob(“Alice”, “AIDS”) = 1
Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07 Approach 2 (Cont.) Different views yield different degree of protection Different views yield different degree of protection NameJob SarahArtist AliceArtist JohnArtistAgeJobDisease50Artist Heart disease 30ArtistAIDS V1V1V1V1 V2V2V2V2 V 1 join V 2 Prob(“Alice”, “AIDS”) < 1 V 1 join V 2 Prob(“Alice”, “AIDS”) < 1
Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07 Problem Set Given a view scheme V, and a set of private associations A, what’s the probability of privacy breach of A by publishing V? Given a view scheme V, and a set of private associations A, what’s the probability of privacy breach of A by publishing V?
Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07 Our Contributions Define two attack models Define two attack models Formally define the probability of privacy breach Formally define the probability of privacy breach Propose connectivity graph as the synopsis of the database Propose connectivity graph as the synopsis of the database Derive the formulas of quantifying probability of privacy breach Derive the formulas of quantifying probability of privacy breach
Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07 Outline Introduction Introduction Security model & attack model Security model & attack model Measurement of probability Measurement of probability Conclusion & future work Conclusion & future work
Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07 Security Model Private association Private association Form: (ID=I, P=p) Form: (ID=I, P=p) E.g., (Name=“Alice”, Disease=“AIDS”) E.g., (Name=“Alice”, Disease=“AIDS”) Can be expressed in SQL Can be expressed in SQL View: duplicate free View: duplicate free Base table Base table Uniqueness property: every ID value is associated with a unique p value in the base table Uniqueness property: every ID value is associated with a unique p value in the base table
Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07 Attack Model 1: Unrestricted Model The attacker does NOT know the existence of uniqueness property The attacker does NOT know the existence of uniqueness property The attacker can access the view definition and the view tables The attacker can access the view definition and the view tables The attack approach The attack approach Construct the candidates of base table Construct the candidates of base table Pick the ones that contain the private association Pick the ones that contain the private association
Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07 Example of Unrestricted Model ABC a1b1c1 a2b1c2BCb1c1 b1c2ABa1b1 a2b1 ABCa1b1c1 a2b1c2ABCa1b1c2 a2b1c1 Base table T V1= A, B (T) V2 = B, c (T) Possible world #1 Possible world #2 ABCa1b1c2 a2b1c2 a1b1c1 Possible world # unrestricted possible worlds Attacker knows: Attackerconstructs: For (A=a1, C=c1), attacker picks: 5 unrestricted interesting worlds √ √X Prob. of privacy breach of (A=a1, C=c1): 5/7
Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07 Attack Model 2: Restricted Model The attacker knows the existence of uniqueness property The attacker knows the existence of uniqueness property Similar attack approach Similar attack approach Only pick the possible/interesting worlds that satisfy the uniqueness property Only pick the possible/interesting worlds that satisfy the uniqueness property
Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07 Example of Restricted Model ABC a1b1C1 a2b1c2BCb1c1 b1c2ABa1b1 a2b1 ABCa1b1c1 a2b1c2ABCa1b1c2 a2b1c1 Base table T V1= A, B (T) V2 = B, c (T) Possible world #1 Possible world #2 Attacker knows: Attacker constructs For (A=a1, C=c1), attacker picks: For (A=a1, C=c1), attacker picks: √ X Prob. of privacy breach of (A=a1, C=c1): 1/2 2 restricted possible worlds 1 restricted interesting world
Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07 How to Measure Probability? Construction of possible/interesting worlds is not efficient! Construction of possible/interesting worlds is not efficient! Is there a better way for probability measurement? Is there a better way for probability measurement? Our approach: connectivity graph + (interesting) covers Our approach: connectivity graph + (interesting) covers
Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07 Probability Measurement: Unrestricted Model ABC a1b1 c1c1c1c1 a2b1 c2c2c2c2 a1b2c1 Base table T V1= A, B (T), V2 = B, c (T) View scheme: a1, b1 a2, b1 b1, c1 b1, c2 Connectivity graph Unrestricted covers = Unrestricted possible worlds Unrestricted covers = Unrestricted possible worlds E.g., (,, ) E.g., (,, ) Private association: (A=a1, C=c1) Unrestricted interesting covers = Unrestricted Interesting worlds Unrestricted interesting covers = Unrestricted Interesting worlds Prob. = # of unrestricted interesting covers / # of unrestricted covers Prob. = # of unrestricted interesting covers / # of unrestricted covers
Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07 Quantification Formulas: 2-table- case For view schemes of 2 view tables For view schemes of 2 view tables Equivalent to a 2-partite connectivity graph (m, n) Equivalent to a 2-partite connectivity graph (m, n) Unrestricted model Unrestricted model Prob.= Prob.= Restricted model Restricted model Prob. = 1/n Prob. = 1/n
Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07 Quantification Formulas: k-table- case For view schemes of k>2 view tables For view schemes of k>2 view tables The connectivity graph is not complete as 2- view-table case The connectivity graph is not complete as 2- view-table case We don’t have a general formula of prob. of privacy breach We don’t have a general formula of prob. of privacy breach Have no choice but to enumerate all possible /interesting worlds Have no choice but to enumerate all possible /interesting worlds
Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07 Related Work K-anonymity (e.g. [Bayardo05], [LeFevre05]) K-anonymity (e.g. [Bayardo05], [LeFevre05]) Modify data values Modify data values View publishing View publishing [Miklau04], [Deutsch05] [Miklau04], [Deutsch05] Focus on complexity, not computation of probability Focus on complexity, not computation of probability [Yao05] [Yao05] Measure privacy breach by k-anonymity, not probability Measure privacy breach by k-anonymity, not probability [Xiao06] [Xiao06] Utility on aggregate result, not public associations Utility on aggregate result, not public associations
Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07 Conclusion We defined a general framework to measure the likelihood of privacy breach We defined a general framework to measure the likelihood of privacy breach We proposed two attack models We proposed two attack models For 2-view-table-case, we derived the formulas to calculate the probability of privacy breach For 2-view-table-case, we derived the formulas to calculate the probability of privacy breach
Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07 Future Work For 2-view-table case, find an approximation of the formulas of probability for unrestricted model For 2-view-table case, find an approximation of the formulas of probability for unrestricted model Keep on working on k-view-table case (k>2) Keep on working on k-view-table case (k>2) Extension of restricted model Extension of restricted model Given a set of private/public associations and a base table, how to design the safe and useful views? Given a set of private/public associations and a base table, how to design the safe and useful views?
Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07 Q & A
Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07 References [Bayardo05] Roberto J. Bayardo, Rakesh Agrawal, Data privacy through optimal k-anonymization, ICDE, [Bayardo05] Roberto J. Bayardo, Rakesh Agrawal, Data privacy through optimal k-anonymization, ICDE, [LeFevre05] Kristen LeFevre, David DeWitt, and Raghu Ramakrishnan, Incognito: Efficient Full-domain K- anonymity', SIGMOD'05. [LeFevre05] Kristen LeFevre, David DeWitt, and Raghu Ramakrishnan, Incognito: Efficient Full-domain K- anonymity', SIGMOD'05. [Miklau04] Gerome Miklau, Dan Suciu, A Formal Analysis of Information Disclosure in Data Exchange', SIGMOD'04. [Miklau04] Gerome Miklau, Dan Suciu, A Formal Analysis of Information Disclosure in Data Exchange', SIGMOD'04. [Xiao06] Xiaokui Xiao, Yufei Tao, Anatomy: Simple and Effective Privacy Preservation', VLDB, [Xiao06] Xiaokui Xiao, Yufei Tao, Anatomy: Simple and Effective Privacy Preservation', VLDB, [Yao05] Chao Yao, X.Sean Wang, Sushil Jajodia, Checking for k-Anonymity Violation by Views', VLDB'05. [Yao05] Chao Yao, X.Sean Wang, Sushil Jajodia, Checking for k-Anonymity Violation by Views', VLDB'05.