Presentation is loading. Please wait.

Presentation is loading. Please wait.

Deriving Private Information from Association Rule Mining Results Zutao Zhu, Guan Wang, and Wenliang Du ICDE /3/181.

Similar presentations


Presentation on theme: "Deriving Private Information from Association Rule Mining Results Zutao Zhu, Guan Wang, and Wenliang Du ICDE /3/181."— Presentation transcript:

1 Deriving Private Information from Association Rule Mining Results Zutao Zhu, Guan Wang, and Wenliang Du ICDE 2009 2010/3/181

2 Outline Motivation Problem Formulation Maximum Entropy Modeling Deriving Constraints From Association Rules Deriving Constraints From NAR- Association Rules Algorithm Conclusion 2010/3/182

3 Motivation Data publishing can provide enormous benefits to the society, however, due to privacy concerns, data cannot be published in their original forms.  To publish the sanitized version of the original data.  To publish the aggregate information from the original data, such as data mining results. The objective of this paper is to develop a systematic method to quantify privacy disclosure caused by the publishing of data mining results. 2010/3/183

4 (Cont.) Assumptions  The original dataset consists of two parts:  QI (Quasi-identifier) attributes  SA (Sensitive Attributes) 2010/3/184 Assume that adversaries have all the data of the QI attributes. Assume that adversaries know the domain of the SA.

5 (Cont.) The goal of privacy-preserving data publishing is to prevent adversaries from inferring any individual’s SA information, while making the published information as useful as possible. Linking attack The severity of linking attacks is decided by the conditional probability P(SA|QI).  While P(SA|QI) → 1, the more certain adversaries can infer the SA value of an individual with QI. 2010/3/185

6 (Cont.) Min_sup =0.3, and min_conf =0.8 The domain of Salary is {50K+,50K-}. The useful association rules are those of pattern QI → SA. We can directly derive P(SA|QI) and P(QI,SA) from publishing association rules in Figure (b). Even if the exact conf. and sup. of each rule is suppressed from the disclosure, we can still derive the inequalities. 2010/3/186

7 (Cont.) If QI → SA is not an association rule, it also gives adversaries useful information. Min_sup=0.6, min_conf=0.9 The pattern “Gender = Female → Salary = 50K+” is not published. 2010/3/187

8 Problem Formulation Let D be the original data set that is used to generate the data mining results ( Ω ). Let variable X represent SA attributes, and variable Q represent QI attributes. Given Ω and the QI part of all the records in D, derive P(X|Q) for all the combinations of Q and X values. 2010/3/188

9 (Cont.) We treat P(X|Q) as a variable for each combination of X ∈ SA and Q ∈ QI. The goal of deriving P(X|Q) is to assign probability values to these variables. ◦ Data mining results contain information about P(X|Q), so the assignment of these probability variables should be consistent with the information embedded in the data mining results. ◦ The embedded information can be formulated as constraints, which are in the forms of equations or inequalities. 2010/3/189

10 Maximum Entropy (ME) principle According to the principle of ME, when the entropy of these variables is maximized, the inference is the most unbiased. Our problem becomes finding a distribution of P(X|Q), such that the following conditional entropy H(X|Q) is maximized. 2010/3/1810

11 Deriving Constraints From Association Rules To estimate P(X|Q) based on data mining results, we need to convert the knowledge embedded into equations or inequalities using P(X|Q) or P(Q, X) as variables. We call these equations and inequalities ME constraints. AR-constraints: two potential scenarios ◦ Withhold the exact support and confidence. ◦ With the exact support and confidence. 2010/3/1811

12 Deriving Constraints From Non- Association Rules If Q → X is not one of the published association rules, we can derive the following constraints: 2010/3/1812

13 Algorithm to derive AR- and NAR- Constraints Apriori-based algorithm 2010/3/1813

14 Conclusion It propose a quantitative analysis for the information disclosure of data mining results. Thinking: ◦ Sanitizing the original datasets before publishing data mining results. ◦ Disguising the association rule, such that the privacy-preserving. 2010/3/1814


Download ppt "Deriving Private Information from Association Rule Mining Results Zutao Zhu, Guan Wang, and Wenliang Du ICDE /3/181."

Similar presentations


Ads by Google