Presentation is loading. Please wait.

Presentation is loading. Please wait.

Reconstruction-Based Association Rule Hiding Author: Yuhong Guo (MS-Ph.D. Candidate, Peking Univ., China) Advisor: Prof. Shiwei Tang Co-Advisors:

Similar presentations


Presentation on theme: "Reconstruction-Based Association Rule Hiding Author: Yuhong Guo (MS-Ph.D. Candidate, Peking Univ., China) Advisor: Prof. Shiwei Tang Co-Advisors:"— Presentation transcript:

1 Reconstruction-Based Association Rule Hiding Author: Yuhong Guo (MS-Ph.D. Candidate, Peking Univ., China) Advisor: Prof. Shiwei Tang Co-Advisors: Prof. Dongqing Yang, Jian Pei Sunday, June 10, 2007

2 SIGMOD Ph.D. Workshop IDAR ’ 072 Association Rule Hiding: what? why?? and how??? Problem: hide sensitive association rules in data without losing non-sensitives Motivations: large repositories of data contain confidential rules disclosed with serious adverse effects Solutions Data modification  distortion blocking Data reconstruction Traditional: fine-tuning, control the hiding effects indirectly New promising: knowledge sanitization, control effects directly

3 SIGMOD Ph.D. Workshop IDAR ’ 073 Outline Background Motivation Problem statement Related work Proposed Solution Current Progress Evaluation Plan

4 SIGMOD Ph.D. Workshop IDAR ’ 074 Motivation Two problems addressed in PPDM the protection of private data the protection of sensitive rules (knowledge) contained in the data Background Data mining Data sharing Privacy preserving Privacy Preserving Data mining (PPDM)

5 SIGMOD Ph.D. Workshop IDAR ’ 075 Problem statement Given a database D to be released minimum threshold “ MST ”, “ MCT ” a set of association rules R mined from D a set of sensitive rules R h R to be hided Find a new database D ’ such that the rules in R h cannot be mined from D ’ the rules in R-R h can still be mined as many as possible KHD (Knowledge Hiding in Database) problem Background

6 SIGMOD Ph.D. Workshop IDAR ’ 076 Related work Data modification approaches Basic idea: data sanitization D->D ’ Current status:distortion,blocking, prosperous Drawbacks Cannot control hiding effects intuitively, lots of I/O Data reconstruction approaches Basic idea:knowledge sanitization D->K->D ’ Current status:limited, 3 papers Advantages Can easily control the availability of rules and control the hiding effects directly, intuitively, handily Background

7 SIGMOD Ph.D. Workshop IDAR ’ 077 Classification of current algorithms Hide rules Hide large itemsets Data modification Data- Distortion Algo1a Algo1b Algo2a WSDA PDA Algo2b Algo2c Naïve MinFIA MaxFIA IGA RRA RA SWA Border-Based Integer-Programing Sanitization-Matrix Data- Blocking CR CR2 GIH Data reconstructionCIILM Background lots of reconstruction-based work is expected

8 SIGMOD Ph.D. Workshop IDAR ’ 078 Outline Background Proposed Solution Framework Example Discussion Current Progress Evaluation Plan

9 SIGMOD Ph.D. Workshop IDAR ’ 079 Framework of our approach Proposed Solution D ’ D D. 1Frequent Set Mining FS R R - Rh ’ FS. 2Perform sanitization Algorithm 3. FP - tree-based Inverse Frequent Set Mining FP-tree

10 SIGMOD Ph.D. Workshop IDAR ’ 0710 The first two phases 1. Frequent set mining Generate all frequent itemsets with their supports and support counts FS from original database D 2. Perform sanitization algorithm Input: FS output in phase 1, R, R h Output: sanitized frequent itemsets FS ’ Process Select hiding strategy Identify sensitive frequent sets Perform sanitization Proposed Solution In best cases, sanitization algorithm can ensure from FS ’,we can exactly get the non-sensitive rules set R-R h

11 SIGMOD Ph.D. Workshop IDAR ’ 0711 The third phase: FP-tree-based inverse mining Basic idea: use FP-tree as a transition “ bridge ”, which reduces the gap between a database and its frequent itemsets and makes transformation more easily Proposed Method FS D1 TempD D2 Frequent Itemsets FP-Tree Temporary Database... A set of Compatible databases (i)(ii)(iii) (i) Generate a compatible FP-tree (ii) Generate a TempD that only includes frequent items (iii) Scatter infrequent items into TempD

12 SIGMOD Ph.D. Workshop IDAR ’ 0712 Example: the first two phases Proposed Solution TIDItems T1ABCE T2ABC T3ABCD T4ABD T5AD T6ACD Oiginal Database: D σ =4 MST=66% MCT=75% Frequent Itemsets: FS' A:6 100% C:4 66% D:4 66% AC:4 66% AD:4 66% rules confid- ence support C  A 100%66% D  A 100%66% Association Rules: R-R h 1. Frequent set mining 2. Perform sanitization algorithm

13 SIGMOD Ph.D. Workshop IDAR ’ 0713 Example: the third phase Proposed Solution Frequent Itemsets: FS' A:6 100% C:4 66% D:4 66% AC:4 66% AD:4 66% D TIDItems T1ACD T2ACD T3AC T4AC T5AD T6AD Released Database: ' A:6 C:4 FP D:2 D:2 σ=4 Difficulties : 1.How to find the target FP-tree 2.How to control |D’|

14 SIGMOD Ph.D. Workshop IDAR ’ 0714 Discussion Sanitization algorithm Compared with early popular data sanitization : performs sanitization directly on knowledge level of data Inverse frequent set mining algorithm Deals with frequent items and infrequent items separately: more efficiently, a large number of outputs Proposed Solution Our solution provides user with a knowledge level window to perform sanitization handily and generates a number of secure databases

15 SIGMOD Ph.D. Workshop IDAR ’ 0715 Outline Background Proposed Solution Current Progress Work to date Future work Expected contributions Evaluation Plan

16 SIGMOD Ph.D. Workshop IDAR ’ 0716 Work to date FP-tree-based method for inverse frequent set mining (used in the 3rd phase of our framework) First effort Published in Proc. of BNCOD'06 Provides a good heuristic search strategy to rapidly find a FP-tree satisfying the given constraints, leading to rapidly finding a set of compatible databases Further work Accepted by Journal of Software (JOS) A more mature and well-designed FP-tree-based method for inverse frequent set mining by iteratively solving a sub linear constraint problem Current Progress

17 SIGMOD Ph.D. Workshop IDAR ’ 0717 Future work Develop a sound sanitization algorithm with the following considerations The support and confidence of the rules in R- R h should remain unchanged as much as possible Can select appropriate hiding strategies according to different kinds of correlations among the rules in R and R h Can prevent rule-based reasoning Investigate how to restrict the number of transactions in the new released database Develop an integrated secure association rule mining tool Can protect privacy data Can protect sensitive rules contained in the data Current Progress DHD KHD Integrated secure tool

18 SIGMOD Ph.D. Workshop IDAR ’ 0718 Expected contributions Current Progress Reconstruction-based ARH Framework Inverse Frequent Set Mining Algorithm CHART: Credible Hiding Association Rule Tool Rule sanitization Algorithm ARH Evaluation Metrics

19 SIGMOD Ph.D. Workshop IDAR ’ 0719 Outline Background Proposed Solution Current Progress Evaluation Plan

20 SIGMOD Ph.D. Workshop IDAR ’ 0720 Evaluation Plan Dataset BMS-POS BMS-WebView-1 BMS-WebView-2 … Evaluation Hiding effects ① Hiding Failure Ratio R h (D’)/R h (D) ② Lost Rules Ratio ③ Ghost Rules Ratio Data utility Time performance ① Hiding Failure ② Lost Rules ③ Ghost Rules R R ’ R h R~ h (~R h (D) − ~R h (D’))/ ~R h (D) ( ∣ R’ ∣ − ∣ R∩R’ ∣ )/ ∣ R’ ∣

21 SIGMOD Ph.D. Workshop IDAR ’ FP-tree- 3.FP-tree-based Inverse Frequent Set Mining Summary D ’ D D. 1Frequent Set Mining FS R R - Rh ’ FS. 2Perform sanitization Algorithm FP-tree 3. FP-tree-based Inverse Frequent Set Mining Basically completed! Ongoing! Reconstruction-based Association Rule Hiding

22 Thanks for your attention Any suggestion or question?


Download ppt "Reconstruction-Based Association Rule Hiding Author: Yuhong Guo (MS-Ph.D. Candidate, Peking Univ., China) Advisor: Prof. Shiwei Tang Co-Advisors:"

Similar presentations


Ads by Google