Deriving Private Information from Association Rule Mining Results Zutao Zhu, Guan Wang, and Wenliang Du ICDE /3/181.

Slides:



Advertisements
Similar presentations
Association Analysis (Data Engineering). Type of attributes in assoc. analysis Association rule mining assumes the input data consists of binary attributes.
Advertisements

Simulatability “The enemy knows the system”, Claude Shannon CompSci Instructor: Ashwin Machanavajjhala 1Lecture 6 : Fall 12.
LEON COURVILLE Regulation and Efficiency in the Electric Utility Industry.
PRIVACY AND SECURITY ISSUES IN DATA MINING P.h.D. Candidate: Anna Monreale Supervisors Prof. Dino Pedreschi Dott.ssa Fosca Giannotti University of Pisa.
An Approach to Evaluate Data Trustworthiness Based on Data Provenance Department of Computer Science Purdue University.
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification Wenliang (Kevin) Du, Zhouxuan Teng, and Zutao Zhu. Department of Electrical.
K Beyond k-Anonimity: A Decision Theoretic Framework for Assessing Privacy Risk M.Scannapieco, G.Lebanon, M.R.Fouad and E.Bertino.
1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.
SAC’06 April 23-27, 2006, Dijon, France Towards Value Disclosure Analysis in Modeling General Databases Xintao Wu UNC Charlotte Songtao Guo UNC Charlotte.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Anatomy: Simple and Effective Privacy Preservation Israel Chernyak DB Seminar (winter 2009)
1 Deriving Private Information from Randomized Data Zhengli Huang Wenliang (Kevin) Du Biao Chen Syracuse University.
EE 685 presentation Optimization Flow Control, I: Basic Algorithm and Convergence By Steven Low and David Lapsley Asynchronous Distributed Algorithm Proof.
Using Entropy to Trade Privacy for Trust Yuhui Zhong Bharat Bhargava {zhong, Department of Computer Sciences Purdue University This work.
L-Diversity: Privacy Beyond K-Anonymity
Privacy Preserving Data Mining: An Overview and Examination of Euclidean Distance Preserving Data Transformation Chris Giannella cgiannel AT acm DOT org.
7-1 Introduction The field of statistical inference consists of those methods used to make decisions or to draw conclusions about a population. These.
PRIVACY CRITERIA. Roadmap Privacy in Data mining Mobile privacy (k-e) – anonymity (c-k) – safety Privacy skyline.
Differentially Private Transit Data Publication: A Case Study on the Montreal Transportation System Rui Chen, Concordia University Benjamin C. M. Fung,
Multi-Style Language Model for Web Scale Information Retrieval Kuansan Wang, Xiaolong Li and Jianfeng Gao SIGIR 2010 Min-Hsuan Lai Department of Computer.
Overview of Privacy Preserving Techniques.  This is a high-level summary of the state-of-the-art privacy preserving techniques and research areas  Focus.
by B. Zadrozny and C. Elkan
Publishing Microdata with a Robust Privacy Guarantee
Estimation Bias, Standard Error and Sampling Distribution Estimation Bias, Standard Error and Sampling Distribution Topic 9.
1 A Bayesian Method for Guessing the Extreme Values in a Data Set Mingxi Wu, Chris Jermaine University of Florida September 2007.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Classification - SVM CS 685: Special Topics in Data Mining Jinze Liu.
Data Publishing against Realistic Adversaries Johannes Gerhrke Cornell University Ithaca, NY Michaela Götz Cornell University Ithaca, NY Ashwin Machanavajjhala.
Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )
Protecting Sensitive Labels in Social Network Data Anonymization.
Background Knowledge Attack for Generalization based Privacy- Preserving Data Mining.
Refined privacy models
Accuracy-Constrained Privacy-Preserving Access Control Mechanism for Relational Data.
Routing In Socially Selfish Delay Tolerant Networks Chan-Myung Kim
Data Anonymization (1). Outline  Problem  concepts  algorithms on domain generalization hierarchy  Algorithms on numerical data.
On information theory and association rule interestingness Loo Kin Kong 5 th July, 2002.
Additive Data Perturbation: the Basic Problem and Techniques.
1 Publishing Naive Bayesian Classifiers: Privacy without Accuracy Loss Author: Barzan Mozafari and Carlo Zaniolo Speaker: Hongwei Tian.
Preserving Location Privacy in Wireless LANs Jiang, Wang and Hu MobiSys 2007 Presenter: Bibudh Lahiri.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Classification - SVM CS 685: Special Topics in Data Mining Spring 2008 Jinze Liu.
Randomization in Privacy Preserving Data Mining Agrawal, R., and Srikant, R. Privacy-Preserving Data Mining, ACM SIGMOD’00 the following slides include.
Privacy-preserving rule mining. Outline  A brief introduction to association rule mining  Privacy preserving rule mining Single party  Perturbation.
Mining Quantitative Association Rules in Large Relational Tables ACM SIGMOD Conference 1996 Authors: R. Srikant, and R. Agrawal Presented by: Sasi Sekhar.
Privacy vs. Utility Xintao Wu University of North Carolina at Charlotte Nov 10, 2008.
EE 685 presentation Optimization Flow Control, I: Basic Algorithm and Convergence By Steven Low and David Lapsley.
Privacy-preserving data publishing
1/3/ A Framework for Privacy- Preserving Cluster Analysis IEEE ISI 2008 Benjamin C. M. Fung Concordia University Canada Lingyu.
Anonymizing Data with Quasi-Sensitive Attribute Values Pu Shi 1, Li Xiong 1, Benjamin C. M. Fung 2 1 Departmen of Mathematics and Computer Science, Emory.
The world’s libraries. Connected. Managing your Private and Public Data: Bringing down Inference Attacks against your Privacy Group Meeting in 2015.
On Reducing Classifier Granularity in Mining Concept-Drifting Data Streams Peng Wang, H. Wang, X. Wu, W. Wang, and B. Shi Proc. of the Fifth IEEE International.
Data Mining And Privacy Protection Prepared by: Eng. Hiba Ramadan Supervised by: Dr. Rakan Razouk.
Classification - CBA CS 485: Special Topics in Data Mining Jinze Liu.
Versatile Publishing For Privacy Preservation
University of Texas at El Paso
ACHIEVING k-ANONYMITY PRIVACY PROTECTION USING GENERALIZATION AND SUPPRESSION International Journal on Uncertainty, Fuzziness and Knowledge-based Systems,
Privacy-Preserving Data Mining
Xiaokui Xiao and Yufei Tao Chinese University of Hong Kong
ADAPTIVE DATA ANONYMIZATION AGAINST INFORMATION FUSION BASED PRIVACY ATTACKS ON ENTERPRISE DATA Srivatsava Ranjit Ganta, Shruthi Prabhakara, Raj Acharya.
7-1 Introduction The field of statistical inference consists of those methods used to make decisions or to draw conclusions about a population. These.
Extended Baum-Welch algorithm
J. Zhu, A. Ahmed and E.P. Xing Carnegie Mellon University ICML 2009
CARPENTER Find Closed Patterns in Long Biological Datasets
Presented by : SaiVenkatanikhil Nimmagadda
Walking in the Crowd: Anonymizing Trajectory Data for Pattern Analysis
Information Theoretical Analysis of Digital Watermarking
Refined privacy models
Generalized Diagnostics with the Non-Axiomatic Reasoning System (NARS)
Differential Privacy (1)
Presentation transcript:

Deriving Private Information from Association Rule Mining Results Zutao Zhu, Guan Wang, and Wenliang Du ICDE /3/181

Outline Motivation Problem Formulation Maximum Entropy Modeling Deriving Constraints From Association Rules Deriving Constraints From NAR- Association Rules Algorithm Conclusion 2010/3/182

Motivation Data publishing can provide enormous benefits to the society, however, due to privacy concerns, data cannot be published in their original forms.  To publish the sanitized version of the original data.  To publish the aggregate information from the original data, such as data mining results. The objective of this paper is to develop a systematic method to quantify privacy disclosure caused by the publishing of data mining results. 2010/3/183

(Cont.) Assumptions  The original dataset consists of two parts:  QI (Quasi-identifier) attributes  SA (Sensitive Attributes) 2010/3/184 Assume that adversaries have all the data of the QI attributes. Assume that adversaries know the domain of the SA.

(Cont.) The goal of privacy-preserving data publishing is to prevent adversaries from inferring any individual’s SA information, while making the published information as useful as possible. Linking attack The severity of linking attacks is decided by the conditional probability P(SA|QI).  While P(SA|QI) → 1, the more certain adversaries can infer the SA value of an individual with QI. 2010/3/185

(Cont.) Min_sup =0.3, and min_conf =0.8 The domain of Salary is {50K+,50K-}. The useful association rules are those of pattern QI → SA. We can directly derive P(SA|QI) and P(QI,SA) from publishing association rules in Figure (b). Even if the exact conf. and sup. of each rule is suppressed from the disclosure, we can still derive the inequalities. 2010/3/186

(Cont.) If QI → SA is not an association rule, it also gives adversaries useful information. Min_sup=0.6, min_conf=0.9 The pattern “Gender = Female → Salary = 50K+” is not published. 2010/3/187

Problem Formulation Let D be the original data set that is used to generate the data mining results ( Ω ). Let variable X represent SA attributes, and variable Q represent QI attributes. Given Ω and the QI part of all the records in D, derive P(X|Q) for all the combinations of Q and X values. 2010/3/188

(Cont.) We treat P(X|Q) as a variable for each combination of X ∈ SA and Q ∈ QI. The goal of deriving P(X|Q) is to assign probability values to these variables. ◦ Data mining results contain information about P(X|Q), so the assignment of these probability variables should be consistent with the information embedded in the data mining results. ◦ The embedded information can be formulated as constraints, which are in the forms of equations or inequalities. 2010/3/189

Maximum Entropy (ME) principle According to the principle of ME, when the entropy of these variables is maximized, the inference is the most unbiased. Our problem becomes finding a distribution of P(X|Q), such that the following conditional entropy H(X|Q) is maximized. 2010/3/1810

Deriving Constraints From Association Rules To estimate P(X|Q) based on data mining results, we need to convert the knowledge embedded into equations or inequalities using P(X|Q) or P(Q, X) as variables. We call these equations and inequalities ME constraints. AR-constraints: two potential scenarios ◦ Withhold the exact support and confidence. ◦ With the exact support and confidence. 2010/3/1811

Deriving Constraints From Non- Association Rules If Q → X is not one of the published association rules, we can derive the following constraints: 2010/3/1812

Algorithm to derive AR- and NAR- Constraints Apriori-based algorithm 2010/3/1813

Conclusion It propose a quantitative analysis for the information disclosure of data mining results. Thinking: ◦ Sanitizing the original datasets before publishing data mining results. ◦ Disguising the association rule, such that the privacy-preserving. 2010/3/1814