Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07 Probabilistic Privacy Analysis of Published Views Hui (Wendy) Wang Laks V.S.

Slides:



Advertisements
Similar presentations
Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, WPES'06 Probabilistic Privacy Analysis of Published Views Hui (Wendy) Wang Laks V.S.
Advertisements

A COURSE ON PROBABILISTIC DATABASES Dan Suciu University of Washington June, 2014Probabilistic Databases - Dan Suciu 1.
The Role of History and Prediction in Data Privacy Kristen LeFevre University of Michigan May 13, 2009.
Simulatability “The enemy knows the system”, Claude Shannon CompSci Instructor: Ashwin Machanavajjhala 1Lecture 6 : Fall 12.
Optimizing Join Enumeration in Transformation-based Query Optimizers ANIL SHANBHAG, S. SUDARSHAN IIT BOMBAY VLDB 2014
The Volcano/Cascades Query Optimization Framework
Anonymizing Location-based data Jarmanjit Singh Jar_sing(at)encs.concordia.ca Harpreet Sandhu h_san(at)encs.concordia.ca Qing Shi q_shi(at)encs.concordia.ca.
Private Analysis of Graph Structure With Vishesh Karwa, Sofya Raskhodnikova and Adam Smith Pennsylvania State University Grigory Yaroslavtsev
Data Anonymization - Generalization Algorithms Li Xiong CS573 Data Privacy and Anonymity.
Anonymizing Healthcare Data: A Case Study on the Blood Transfusion Service Benjamin C.M. Fung Concordia University Montreal, QC, Canada
PRIVACY AND SECURITY ISSUES IN DATA MINING P.h.D. Candidate: Anna Monreale Supervisors Prof. Dino Pedreschi Dott.ssa Fosca Giannotti University of Pisa.
1 Relational Algebra & Calculus. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.  Relational.
Proportion Priors for Image Sequence Segmentation Claudia Nieuwenhuis, etc. ICCV 2013 Oral.
Anatomy: Simple and Effective Privacy Preservation Xiaokui Xiao, Yufei Tao Chinese University of Hong Kong.
Fast Data Anonymization with Low Information Loss 1 National University of Singapore 2 Hong Kong University
Rakesh Agrawal Ramakrishnan Srikant
1 On the Anonymization of Sparse High-Dimensional Data 1 National University of Singapore 2 Chinese University of Hong.
Anatomy: Simple and Effective Privacy Preservation Israel Chernyak DB Seminar (winter 2009)
Dean H. Lorenz, Danny Raz Operations Research Letter, Vol. 28, No
Suppose I learn that Garth has 3 friends. Then I know he must be one of {v 1,v 2,v 3 } in Figure 1 above. If I also learn the degrees of his neighbors,
April 13, 2010 Towards Publishing Recommendation Data With Predictive Anonymization Chih-Cheng Chang †, Brian Thompson †, Hui Wang ‡, Danfeng Yao † †‡
Privacy Preserving OLAP Rakesh Agrawal, IBM Almaden Ramakrishnan Srikant, IBM Almaden Dilys Thomas, Stanford University.
1 Relational Algebra and Calculus Yanlei Diao UMass Amherst Feb 1, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
The Union-Split Algorithm and Cluster-Based Anonymization of Social Networks Brian Thompson Danfeng Yao Rutgers University Dept. of Computer Science Piscataway,
PRIVACY CRITERIA. Roadmap Privacy in Data mining Mobile privacy (k-e) – anonymity (c-k) – safety Privacy skyline.
Hippocratic Databases Paper by Rakesh Agrawal, Jerry Kiernan, Ramakrishnan Srikant, Yirong Xu CS 681 Presented by Xi Hua March 1st,Spring05.
Auditing Batches of SQL Queries Rajeev Motwani Shubha Nabar Dilys Thomas Stanford University.
TOWARDS IDENTITY ANONYMIZATION ON GRAPHS. INTRODUCTION.
Preserving Privacy in Published Data
Privacy and trust in social network
Secure Incremental Maintenance of Distributed Association Rules.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Calculus Chapter 4, Section 4.3.
m-Privacy for Collaborative Data Publishing
Data Publishing against Realistic Adversaries Johannes Gerhrke Cornell University Ithaca, NY Michaela Götz Cornell University Ithaca, NY Ashwin Machanavajjhala.
Background Knowledge Attack for Generalization based Privacy- Preserving Data Mining.
Refined privacy models
Database Management Systems, R. Ramakrishnan1 Relational Calculus Chapter 4.
K-Anonymity & Algorithms
Resisting Structural Re-identification in Anonymized Social Networks Michael Hay, Gerome Miklau, David Jensen, Don Towsley, Philipp Weis University of.
Data Anonymization (1). Outline  Problem  concepts  algorithms on domain generalization hierarchy  Algorithms on numerical data.
Chapter No 4 Query optimization and Data Integrity & Security.
Xiaowei Ying, Xintao Wu Univ. of North Carolina at Charlotte PAKDD-09 April 28, Bangkok, Thailand On Link Privacy in Randomizing Social Networks.
Preservation of Proximity Privacy in Publishing Numerical Sensitive Data J. Li, Y. Tao, and X. Xiao SIGMOD 08 Presented by Hongwei Tian.
1 Relational Algebra & Calculus Chapter 4, Part A (Relational Algebra)
Privacy-preserving rule mining. Outline  A brief introduction to association rule mining  Privacy preserving rule mining Single party  Perturbation.
Privacy vs. Utility Xintao Wu University of North Carolina at Charlotte Nov 10, 2008.
Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al.
Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.
m-Privacy for Collaborative Data Publishing
1 Privacy Preserving Data Mining Introduction August 2 nd, 2013 Shaibal Chakrabarty.
Data Anonymization - Generalization Algorithms Li Xiong, Slawek Goryczka CS573 Data Privacy and Anonymity.
Database Management Systems, R. Ramakrishnan1 Relational Calculus Chapter 4, Part B.
Secure Data Outsourcing
Towards Robustness in Query Auditing Shubha U. Nabar Stanford University VLDB 2006 Joint Work With B. Marthi, K. Kenthapadi, N. Mishra, R. Motwani.
Yang, et al. Differentially Private Data Publication and Analysis. Tutorial at SIGMOD’12 Part 4: Data Dependent Query Processing Methods Yin “David” Yang.
Personalized Privacy Preservation: beyond k-anonymity and ℓ-diversity SIGMOD 2006 Presented By Hongwei Tian.
Privacy Preserving Outlier Detection using Locality Sensitive Hashing
A Probabilistic Quantifier Fuzzification Mechanism: The Model and Its Evaluation for Information Retrieval Felix Díaz-Hemida, David E. Losada, Alberto.
Xiaowei Ying, Kai Pan, Xintao Wu, Ling Guo Univ. of North Carolina at Charlotte SNA-KDD June 28, 2009, Paris, France Comparisons of Randomization and K-degree.
Privacy in Database Publishing
ACHIEVING k-ANONYMITY PRIVACY PROTECTION USING GENERALIZATION AND SUPPRESSION International Journal on Uncertainty, Fuzziness and Knowledge-based Systems,
Xiaokui Xiao and Yufei Tao Chinese University of Hong Kong
Privacy Preserving Subgraph Matching on Large Graphs in Cloud
Personalized Privacy Protection in Social Networks
CAE: A Collusion Attack against Privacy-preserving Data Aggregation Schemes Wei Yang University of Science and Technology of China (USTC) Contact Me.
Personalized Privacy Protection in Social Networks
Supporting of search-as-you-type using sql in databases
Walking in the Crowd: Anonymizing Trajectory Data for Pattern Analysis
Refined privacy models
Relational Calculus Chapter 4, Part B
Presentation transcript:

Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07 Probabilistic Privacy Analysis of Published Views Hui (Wendy) Wang Laks V.S. Lakshmanan University of British Columbia Vancouver, Canada

Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07 Introduction Publishing relational data containing personal information Publishing relational data containing personal information Privacy concern: private associations Privacy concern: private associations E.g., Alice gets AIDS E.g., Alice gets AIDS Utility concern: public associations Utility concern: public associations E.g. what are the ages that people are possibly to have heart disease E.g. what are the ages that people are possibly to have heart disease NameAgeJobDisease Sarah50Artist Heart disease Alice30ArtistAIDS John50Artist

Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07 Approach 1: K-anonymity K-anonymity (e.g., [Bayardo05], [LeFevre05]) K-anonymity (e.g., [Bayardo05], [LeFevre05]) NameAgeJobDisease *[30,50]Artist Heart disease *[30,50]ArtistAIDS *[30,50]Artist Guaranteed privacy Guaranteed privacy Compromise utility for privacy Compromise utility for privacy Revisit the example: what are the ages that people are possibly to have heart disease Revisit the example: what are the ages that people are possibly to have heart disease

Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07 Approach 2: View Publishing Publishing Views (e.g., [Yao05], [Miklau04], [Xiao06]) Publishing Views (e.g., [Yao05], [Miklau04], [Xiao06]) NameAge Sarah50 Alice30 John50AgeJobDisease50Artist Heart disease 30ArtistAIDS V1V1V1V1 V2V2V2V2 Guaranteed utility Guaranteed utility Possibility of privacy breach Possibility of privacy breach E.g., V 1 join V 2 Prob(“Alice”, “AIDS”) = 1 E.g., V 1 join V 2 Prob(“Alice”, “AIDS”) = 1

Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07 Approach 2 (Cont.) Different views yield different degree of protection Different views yield different degree of protection NameJob SarahArtist AliceArtist JohnArtistAgeJobDisease50Artist Heart disease 30ArtistAIDS V1V1V1V1 V2V2V2V2 V 1 join V 2 Prob(“Alice”, “AIDS”) < 1 V 1 join V 2 Prob(“Alice”, “AIDS”) < 1

Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07 Problem Set Given a view scheme V, and a set of private associations A, what’s the probability of privacy breach of A by publishing V? Given a view scheme V, and a set of private associations A, what’s the probability of privacy breach of A by publishing V?

Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07 Our Contributions Define two attack models Define two attack models Formally define the probability of privacy breach Formally define the probability of privacy breach Propose connectivity graph as the synopsis of the database Propose connectivity graph as the synopsis of the database Derive the formulas of quantifying probability of privacy breach Derive the formulas of quantifying probability of privacy breach

Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07 Outline Introduction Introduction Security model & attack model Security model & attack model Measurement of probability Measurement of probability Conclusion & future work Conclusion & future work

Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07 Security Model Private association Private association Form: (ID=I, P=p) Form: (ID=I, P=p) E.g., (Name=“Alice”, Disease=“AIDS”) E.g., (Name=“Alice”, Disease=“AIDS”) Can be expressed in SQL Can be expressed in SQL View: duplicate free View: duplicate free Base table Base table Uniqueness property: every ID value is associated with a unique p value in the base table Uniqueness property: every ID value is associated with a unique p value in the base table

Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07 Attack Model 1: Unrestricted Model The attacker does NOT know the existence of uniqueness property The attacker does NOT know the existence of uniqueness property The attacker can access the view definition and the view tables The attacker can access the view definition and the view tables The attack approach The attack approach Construct the candidates of base table Construct the candidates of base table Pick the ones that contain the private association Pick the ones that contain the private association

Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07 Example of Unrestricted Model ABC a1b1c1 a2b1c2BCb1c1 b1c2ABa1b1 a2b1 ABCa1b1c1 a2b1c2ABCa1b1c2 a2b1c1 Base table T V1=  A, B (T) V2 =  B, c (T) Possible world #1 Possible world #2 ABCa1b1c2 a2b1c2 a1b1c1 Possible world # unrestricted possible worlds Attacker knows: Attackerconstructs: For (A=a1, C=c1), attacker picks: 5 unrestricted interesting worlds √ √X Prob. of privacy breach of (A=a1, C=c1): 5/7

Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07 Attack Model 2: Restricted Model The attacker knows the existence of uniqueness property The attacker knows the existence of uniqueness property Similar attack approach Similar attack approach Only pick the possible/interesting worlds that satisfy the uniqueness property Only pick the possible/interesting worlds that satisfy the uniqueness property

Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07 Example of Restricted Model ABC a1b1C1 a2b1c2BCb1c1 b1c2ABa1b1 a2b1 ABCa1b1c1 a2b1c2ABCa1b1c2 a2b1c1 Base table T V1=  A, B (T) V2 =  B, c (T) Possible world #1 Possible world #2 Attacker knows: Attacker constructs For (A=a1, C=c1), attacker picks: For (A=a1, C=c1), attacker picks: √ X Prob. of privacy breach of (A=a1, C=c1): 1/2 2 restricted possible worlds 1 restricted interesting world

Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07 How to Measure Probability? Construction of possible/interesting worlds is not efficient! Construction of possible/interesting worlds is not efficient! Is there a better way for probability measurement? Is there a better way for probability measurement? Our approach: connectivity graph + (interesting) covers Our approach: connectivity graph + (interesting) covers

Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07 Probability Measurement: Unrestricted Model ABC a1b1 c1c1c1c1 a2b1 c2c2c2c2 a1b2c1 Base table T V1=  A, B (T), V2 =  B, c (T) View scheme: a1, b1 a2, b1 b1, c1 b1, c2 Connectivity graph Unrestricted covers = Unrestricted possible worlds Unrestricted covers = Unrestricted possible worlds E.g., (,, ) E.g., (,, ) Private association: (A=a1, C=c1) Unrestricted interesting covers = Unrestricted Interesting worlds Unrestricted interesting covers = Unrestricted Interesting worlds Prob. = # of unrestricted interesting covers / # of unrestricted covers Prob. = # of unrestricted interesting covers / # of unrestricted covers

Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07 Quantification Formulas: 2-table- case For view schemes of 2 view tables For view schemes of 2 view tables Equivalent to a 2-partite connectivity graph (m, n) Equivalent to a 2-partite connectivity graph (m, n) Unrestricted model Unrestricted model Prob.= Prob.= Restricted model Restricted model Prob. = 1/n Prob. = 1/n

Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07 Quantification Formulas: k-table- case For view schemes of k>2 view tables For view schemes of k>2 view tables The connectivity graph is not complete as 2- view-table case The connectivity graph is not complete as 2- view-table case We don’t have a general formula of prob. of privacy breach We don’t have a general formula of prob. of privacy breach Have no choice but to enumerate all possible /interesting worlds Have no choice but to enumerate all possible /interesting worlds

Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07 Related Work K-anonymity (e.g. [Bayardo05], [LeFevre05]) K-anonymity (e.g. [Bayardo05], [LeFevre05]) Modify data values Modify data values View publishing View publishing [Miklau04], [Deutsch05] [Miklau04], [Deutsch05] Focus on complexity, not computation of probability Focus on complexity, not computation of probability [Yao05] [Yao05] Measure privacy breach by k-anonymity, not probability Measure privacy breach by k-anonymity, not probability [Xiao06] [Xiao06] Utility on aggregate result, not public associations Utility on aggregate result, not public associations

Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07 Conclusion We defined a general framework to measure the likelihood of privacy breach We defined a general framework to measure the likelihood of privacy breach We proposed two attack models We proposed two attack models For 2-view-table-case, we derived the formulas to calculate the probability of privacy breach For 2-view-table-case, we derived the formulas to calculate the probability of privacy breach

Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07 Future Work For 2-view-table case, find an approximation of the formulas of probability for unrestricted model For 2-view-table case, find an approximation of the formulas of probability for unrestricted model Keep on working on k-view-table case (k>2) Keep on working on k-view-table case (k>2) Extension of restricted model Extension of restricted model Given a set of private/public associations and a base table, how to design the safe and useful views? Given a set of private/public associations and a base table, how to design the safe and useful views?

Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07 Q & A

Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07 References [Bayardo05] Roberto J. Bayardo, Rakesh Agrawal, Data privacy through optimal k-anonymization, ICDE, [Bayardo05] Roberto J. Bayardo, Rakesh Agrawal, Data privacy through optimal k-anonymization, ICDE, [LeFevre05] Kristen LeFevre, David DeWitt, and Raghu Ramakrishnan, Incognito: Efficient Full-domain K- anonymity', SIGMOD'05. [LeFevre05] Kristen LeFevre, David DeWitt, and Raghu Ramakrishnan, Incognito: Efficient Full-domain K- anonymity', SIGMOD'05. [Miklau04] Gerome Miklau, Dan Suciu, A Formal Analysis of Information Disclosure in Data Exchange', SIGMOD'04. [Miklau04] Gerome Miklau, Dan Suciu, A Formal Analysis of Information Disclosure in Data Exchange', SIGMOD'04. [Xiao06] Xiaokui Xiao, Yufei Tao, Anatomy: Simple and Effective Privacy Preservation', VLDB, [Xiao06] Xiaokui Xiao, Yufei Tao, Anatomy: Simple and Effective Privacy Preservation', VLDB, [Yao05] Chao Yao, X.Sean Wang, Sushil Jajodia, Checking for k-Anonymity Violation by Views', VLDB'05. [Yao05] Chao Yao, X.Sean Wang, Sushil Jajodia, Checking for k-Anonymity Violation by Views', VLDB'05.