Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Experimental Study of Association Rule Hiding Techniques Emmanuel Pontikakis* Dept. of Computer Engineering and Informatics.

Similar presentations


Presentation on theme: "An Experimental Study of Association Rule Hiding Techniques Emmanuel Pontikakis* Dept. of Computer Engineering and Informatics."— Presentation transcript:

1 An Experimental Study of Association Rule Hiding Techniques Emmanuel Pontikakis* pontikak@ceid.upatras.gr Dept. of Computer Engineering and Informatics University of Patras Patra, Greece Vassilios Verykios* verykios@cti.gr Dept. of Computer and Communication Engineering University of Thessaly Volos, Greece *Computer Technology Institute Research Unit 3 Athens, Greece

2 Outline  Introduction - Related Work  Distortion-based Techniques  Blocking-based Techniques  Comparison and Analysis  Conclusions

3 Introduction Database User Data Mining Association Rules Changed Database Hide Sensitive Rules

4 Related Work  Association Rule Hiding Blocking-based Technique (Saygin, Verykios, Clifton) Distortion-based (Sanitization) Technique – (Oliveira, Zaiane, Verykios, Dasseni)

5 Outline  Introduction - Related Work  Distortion-based Techniques  Blocking-based Techniques  Comparison and Analysis  Conclusion

6 Distortion-based Techniques ABCD 1110 1011 0001 1110 1011 Rule A →C has: Support(A→C)=80% Confidence(A→C)=100% Sample Database ABCD 1110 1001 0001 1110 1001 Distorted Database Rule A →C has now: Support(A→C)=40% Confidence(A→C)=50% Distortion Algorithm

7 Side Effects Before Hiding Process After Hiding Process Side Effect Rule R i has had conf(R i )>MCT conf(R i )<MCT Rule R i has now conf(R i )<MCT Rule Eliminated (Undesirable Side Effect) Rule R i has had conf(R i )<MCT conf(R i )>MCT Rule R i has now conf(R i )>MCT Ghost Rule (Undesirable Side Effect) sup(I)>MST Large Itemset I has had sup(I)>MST sup(I)<MST Itemset I has now sup(I)<MST Itemset Eliminated (Undesirable Side Effect)

8 Distortion-based Techniques  Challenges/Goals: To minimize the undesirable Side Effects that the hiding process causes to non-sensitive rules. 1’s To minimize the number of 1’s that must be deleted in the database. Algorithms must be linear in time as the database increases in size.

9 Our Proposal: Weight-based Sorting Distortion Algorithm (WSDA)  High Level Description: Input:  Initial Database  Set of Sensitive Rules  Safety Margin (for example 10%) Output:  Sanitized Database  Sensitive Rules no longer hold in the Database

10 WSDA Algorithm  High Level Description: 1 st step: R S  Retrieve the set of transactions which support sensitive rule R S R S N 1  For each sensitive rule R S find the number N 1 of transaction in which, one item that supports the rule will be deleted

11 WSDA Algorithm  High Level Description: 2 nd step: R i R S w R i  For each rule R i in the Database with common items with R S compute a weight w that denotes how strong is R i R S P i  For each transaction that supports R S compute a priority P i, that denotes how many strong rules this transaction supports

12 WSDA Algorithm  High Level Description: 3 rd step: N 1 P i  Sort the N 1 transactions in ascending order according to their priority value P i 4 th step: N 1 R S  For the first N 1 transactions hide an item that is contained in R S

13 WSDA Algorithm  High Level Description: 5 th step:  Update confidence and support values for other rules in the database

14 Experimental Results of WSDA algorithm Itemsets Remained unaffected in the Database Rules Changed In the Database

15 Experimental Results of WSDA algorithm Average number of items per transaction: 13/50 Average number of items per transaction: 20/50

16 Outline  Introduction - Related Work  Distortion-based Techniques  Blocking-based Techniques  Comparison and Analysis  Conclusion

17 Quality of Data  Sometimes it is dangerous to delete some items from the database (etc. medical databases) because the false data may create undesirable effects.  So, we have to hide the rules in the database by adding uncertainty without distorting the database.

18 Blocking-based Techniques ABCD 1110 1011 0001 1110 1011ABCD1110 10?1 ?001 1110 1011 Blocking Algorithm Initial Database New Database Support and Confidence becomes marginal. In New Database: 60% ≤ conf(A → C) ≤ 100%

19 Modification of Association Rule Definition →  A rule’s A → B confidence and support becomes marginal: A → B)[minsup(A → B), maxsup(A → B)] sup(A → B) [minsup(A → B), maxsup(A → B)] conf(A → B) [minconf(A → B), maxconf(A → B)] →  minsup(A → B)= →  maxsup(A → B)=

20 Modification of Association Rule Definition  minconf(A → B)=  maxconf(A → B)=

21 Negative Border Rules Set (NBRS) Definition  When a rule R has either sup(R)>MSTconf(R) MST AND conf(R)<MCT OR sup(R) MCT sup(R) MCT, then we say that R belongs to NBRS.

22 Side Effects Definition Modification in Blocking-based Techniques Before Hiding Process After Hiding Process Side Effect Rule R i has had conf(R i )>MCT minconf(R i )<MCT Rule R i has now minconf(R i )<MCT Rule Eliminated (Undesirable Side Effect) Rule R i has had conf(R i )<MCT maxconf(R i )>MCT Rule R i has now maxconf(R i )>MCT Ghost Rule (Desirable Side Effect) sup(I)>MST Large Itemset I has had sup(I)>MST minsup(I)<MST Itemset I has now minsup(I)<MST Itemset Eliminated (Undesirable Side Effect) Itemset I has hadsup(I)<MST maxsup(I)>MST Itemset I has now maxsup(I)>MST Ghost Itemset (Desirable Side Effect)

23 Privacy Breaches Definitions i?’s c% confidence  If an item i, some values of which, are hidden by ?’s, is contained in a sensitive rule, a privacy breach will occur if the adversary can assume that with c% confidence. Rmaxconf(R)>MCT c% confidenceR ghost rule  For a rule R with maxconf(R)>MCT, a privacy breach occurs if it can be estimated, with c% confidence, that R is either a sensitive or a ghost rule. i T c% confidence  For a blocked item i in a specific transaction T, a privacy breach occurs if the adversary can estimate with c% confidence that its original value is either 0 or 1.

24 Blocking-Based Techniques  Goals that an algorithm has to achieve:  To put a relatively small number of ?’s and reduce significantly the confidence of senstitive rules.  To minimize the undesirable side effects (rules and itemsets lost) by selecting the items in the appropriate transactions to change, and maximize the desirable side effects.  To modify the database in a way that an adversary cannot recover the original values of the database.

25 Our Proposal: Blocking Algorithm (BA)  High Level Description 1 st step: R S R S I L I R R S.  For each sensitive rule R S (Rule R S has left itemset I L and right itemset I R ) compute how many 0’s and 1’s you have to block, in order to reduce the confidence of R S. 2 nd step: T R R S T LpR’ R S  Find the set of transactions T R that support R S or the set of transactions T LpR’ that support partially R S (support partially the left itemset and do not support the right itemset). T R R common I R T LpR’ R’ common ∈ NBRS  For each transaction in T R find the rules R common with at least one common item with I R and for each transaction in T LpR’ find the R’ common ∈ NBRS with at least one common item with IL. wRcommonw’ R’common.  Assign a weight w for each Rcommon and a weight w’ for each R’common. P T T P T Ti Rcommon w, P T’ Ti’P T ’ T Rcommon w’.  Assign a P T for each transaction in T such as P T is large if transaction Ti has many Rcommon rules with large w, and a priority value P T’ for each Ti’ such as P T ’ is small if transaction T has many Rcommon rules with large w’.

26 Blocking Algorithm  High Level Description 3 rd step: T ∈ T R P Ti T’ ∈ T L’Rp P Ti’  Sort T ∈ T R starting from them with lowest P Ti. and sort T’ ∈ T L’Rp starting from them with highest P Ti’. 4 th step: N 1 T ∈ T R i ∈ I R N 0 T ∈ T L’Rp i ∈ I L  For the first N 1 sorted T ∈ T R block an item i ∈ I R and for the first N 0 sorted T ∈ T L’Rp block an item i ∈ I L 5 th step: minconf(Ri)minsup(Ri)  Update values minconf(Ri), minsup(Ri), for all other rules that have been affected.

27 Blocking-Based Techniques  Main Problems of blocking technique: 1.The maximum confidence of a sensitive rule cannot be reduced. 2.An adversary can infer the hidden values if he applies a smart inference technique, if the blocking algorithm does not add much uncertainty in the database. 3.Both 0’s and 1’s must be hidden, because if only 1’s were hidden the adversary would simply replace all the ?’s with 1’s and would restore easily the initial database. 4.Many ?’s must be inserted, if we don’t want an adversary to infer hidden data.

28 Experimental Results of Blocking Algorithm Large Itemsets Remained after The hiding process Rules changed (%) after the process

29 Experimental Results of Blocking Algorithm (2) Databases with average 20 items per transaction Databases with average 13 items per transaction

30 Experimental Results of Blocking Algorithm (3) Rules changed, when we Change the proportion 0:1 Decision Tree Experiments Misclassified Items (%)

31 Outline  Introduction - Related Work  Distortion-based Techniques  Blocking-based Techniques  Comparison and Analysis  Conclusions

32 Comparison and Analysis Distortion-based Techniques Blocking-based Techniques Privacy Breaches No privacy breaches Many kinds of privacy breaches Simplicity of algorithms SimplerMore complicated Database Modification Database contains false information Many ?’s must be inserted in the Database

33 Outline  Introduction - Related Work  Distortion-based Techniques  Blocking-based Techniques  Comparison and Analysis  Conclusions

34 Conclusions  There are open research problems in Blocking Technique: A) What techniques must be used in order to reduce the privacy breaches? B) In what other ways can we prevent an adversary from inferring the association rules in the database? C) Maybe applying a chi-square test to the final database reveal some correlations between the items

35 References Privacy Preserving Mining of Association Rules.  [Evfimienski et.al] Alexandre Evfimievski, Ramakrishnan Srikant, Rakesh Agrawal, Johannes Gehrke. Privacy Preserving Mining of Association Rules. SIGKDD 2002, Edmonton, Alberta Canada. Privacy Preserving Distributed Mining of Association Rules on Horizontally Partitioned Data  Murat Kantarcioglou and Chris Clifton, Privacy Preserving Distributed Mining of Association Rules on Horizontally Partitioned Data, In Proceedings of the ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (2002), 24–31. Privacy Preserving Association Rule Mining in Vertically Partitioned Data  Jaideep Vaidya and Chris Clifton, Privacy Preserving Association Rule Mining in Vertically Partitioned Data, In the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2002), 639–644.

36 References Algorithms for Balacing Privacy and Knowledge Discovery in Association Rule Mining  Stanley R. M. Oliveira and Osmar R. Zaïane. Algorithms for Balacing Privacy and Knowledge Discovery in Association Rule Mining. In Proc. of the Seventh International Database Engineering & Applications Symposium (IDEAS'03), pp. 54-63, Hong Kong, July 16- 18, 2003. Using Unknowns to Prevent Discovery of Association Rules  Yucel Saygin, Vassilios Verykios, and Chris Clifton, Using Unknowns to Prevent Discovery of Association Rules, SIGMOD Record 30 (2001), no. 4, 45–54. Association Rule Hiding  S. Verykios, Ahmed K. Elmagarmid, Bertino Elisa, Yucel Saygin, and Dasseni Elena, Association Rule Hiding, IEEE Transactions on Knowledge and Data Engineering (2003).


Download ppt "An Experimental Study of Association Rule Hiding Techniques Emmanuel Pontikakis* Dept. of Computer Engineering and Informatics."

Similar presentations


Ads by Google