An Experimental Study of Association Rule Hiding Techniques Emmanuel Pontikakis* Dept. of Computer Engineering and Informatics.

Slides:



Advertisements
Similar presentations
Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.
Advertisements

Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
Privacy-Preserving Databases and Data Mining Yücel SAYGIN
Template-Based Privacy Preservation in Classification Problems IEEE ICDM 2005 Benjamin C. M. Fung Simon Fraser University BC, Canada Ke.
Reconstruction-Based Association Rule Hiding Author: Yuhong Guo (MS-Ph.D. Candidate, Peking Univ., China) Advisor: Prof. Shiwei Tang Co-Advisors:
Sampling Large Databases for Association Rules ( Toivenon’s Approach, 1996) Farzaneh Mirzazadeh Fall 2007.
Privacy Preserving Association Rule Mining in Vertically Partitioned Data Reporter : Ximeng Liu Supervisor: Rongxing Lu School of EEE, NTU
Association Rule Mining Part 2 (under construction!) Introduction to Data Mining with Case Studies Author: G. K. Gupta Prentice Hall India, 2006.
Data Mining Association Rules Yao Meng Hongli Li Database II Fall 2002.
Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules S.D. Lee, David W. Cheung, Ben Kao The University of Hong.
An architecture for Privacy Preserving Mining of Client Information Jaideep Vaidya Purdue University This is joint work with Murat.
Maintenance of Discovered Association Rules S.D.LeeDavid W.Cheung Presentation : Pablo Gazmuri.
Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules S.D. Lee, D. W. Cheung, B. Kao Department of Computer Science.
Anatomy: Simple and Effective Privacy Preservation Israel Chernyak DB Seminar (winter 2009)
Privacy Preserving K-means Clustering on Vertically Partitioned Data Presented by: Jaideep Vaidya Joint work: Prof. Chris Clifton.
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Association Rule Mining (Some material adapted from: Mining Sequential Patterns by Karuna Pande Joshi)‏
2/8/00CSE 711 data mining: Apriori Algorithm by S. Cha 1 CSE 711 Seminar on Data Mining: Apriori Algorithm By Sung-Hyuk Cha.
1 A DATA MINING APPROACH FOR LOCATION PREDICTION IN MOBILE ENVIRONMENTS* by Gökhan Yavaş Feb 22, 2005 *: To appear in Data and Knowledge Engineering, Elsevier.
Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.
Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules S.D. Lee David W. Cheung Ben Kao The University of Hong Kong.
1 Fast Algorithms for Mining Association Rules Rakesh Agrawal Ramakrishnan Srikant Slides from Ofer Pasternak.
CSE 634 Data Mining Techniques Association Rules Hiding (Not Mining) Prateek Duble ( ) Course Instructor: Prof. Anita Wasilewska State University.
ACM SIGKDD Aug – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Inverted Matrix: Efficient Discovery.
Privacy-Preserving Data Mining Rakesh Agrawal Ramakrishnan Srikant IBM Almaden Research Center 650 Harry Road, San Jose, CA Published in: ACM SIGMOD.
Cmpt-225 Simulation. Application: Simulation Simulation  A technique for modeling the behavior of both natural and human-made systems  Goal Generate.
Berendt: Knowledge and the Web, 2014, 1 Knowledge and the Web / Privacy and Big Data – Data Mining and Privacy.
Bayesian Decision Theory Making Decisions Under uncertainty 1.
USpan: An Efficient Algorithm for Mining High Utility Sequential Patterns Authors: Junfu Yin, Zhigang Zheng, Longbing Cao In: Proceedings of the 18th ACM.
Association Rules. 2 Customer buying habits by finding associations and correlations between the different items that customers place in their “shopping.
ICMLC2007, Aug. 19~22, 2007, Hong Kong 1 Incremental Maintenance of Ontology- Exploiting Association Rules Ming-Cheng Tseng 1, Wen-Yang Lin 2 and Rong.
1 Apriori Algorithm Review for Finals. SE 157B, Spring Semester 2007 Professor Lee By Gaurang Negandhi.
Secure Incremental Maintenance of Distributed Association Rules.
1 Verifying and Mining Frequent Patterns from Large Windows ICDE2008 Barzan Mozafari, Hetal Thakkar, Carlo Zaniolo Date: 2008/9/25 Speaker: Li, HueiJyun.
Introduction to: 1.  Goal[DEN83]:  Provide frequency, average, other statistics of persons  Challenge:  Preserving privacy[DEN83]  Interaction between.
Privacy Preserving Mining of Association Rules Alexandre Evfimievski, Ramakrishnan Srikant, Rakesh Agrawal, Johannes Gehrke IBM Almaden Research Center.
On information theory and association rule interestingness Loo Kin Kong 5 th July, 2002.
Implementation of “A New Two-Phase Sampling Based Algorithm for Discovering Association Rules” Tokunbo Makanju Adan Cosgaya Faculty of Computer Science.
Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang.
Randomization in Privacy Preserving Data Mining Agrawal, R., and Srikant, R. Privacy-Preserving Data Mining, ACM SIGMOD’00 the following slides include.
Privacy-preserving rule mining. Outline  A brief introduction to association rule mining  Privacy preserving rule mining Single party  Perturbation.
Privacy-Preserving K-means Clustering over Vertically Partitioned Data Reporter : Ximeng Liu Supervisor: Rongxing Lu School of EEE, NTU
Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules S.D. Lee, David W. Cheung, Ben Kao The University of Hong.
1 Efficient Algorithms for Incremental Update of Frequent Sequences Minghua ZHANG Dec. 7, 2001.
Mining Frequent Itemsets from Uncertain Data Presenter : Chun-Kit Chui Chun-Kit Chui [1], Ben Kao [1] and Edward Hung [2] [1] Department of Computer Science.
1 Inverted Matrix: Efficient Discovery of Frequent Items in Large Datasets in the Context of Interactive Mining -SIGKDD’03 Mohammad El-Hajj, Osmar R. Zaïane.
Mining Quantitative Association Rules in Large Relational Tables ACM SIGMOD Conference 1996 Authors: R. Srikant, and R. Agrawal Presented by: Sasi Sekhar.
Privacy vs. Utility Xintao Wu University of North Carolina at Charlotte Nov 10, 2008.
MaskIt: Privately Releasing User Context Streams for Personalized Mobile Applications SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference.
Text Document Categorization by Term Association Maria-luiza Antonie Osmar R. Zaiane University of Alberta, Canada 2002 IEEE International Conference on.
Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.
Berendt: Knowledge and the Web, 2015, 1 Knowledge and the Web / Privacy and Big Data – Data Mining {against,
1 Limiting Privacy Breaches in Privacy Preserving Data Mining In Proceedings of the 22 nd ACM SIGACT – SIGMOD – SIFART Symposium on Principles of Database.
1 Privacy Preserving Data Mining Introduction August 2 nd, 2013 Shaibal Chakrabarty.
1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree : An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.
NeSy-2006, ECAI-06 Workshop 29 August, 2006, Riva del Garda, Italy Jim Prentzas & Ioannis Hatzilygeroudis Construction of Neurules from Training Examples:
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 1 Database Systems.
1 Maintaining Data Privacy in Association Rule Mining Speaker: Minghua ZHANG Oct. 11, 2002 Authors: Shariq J. Rizvi Jayant R. Haritsa VLDB 2002.
Rapid Association Rule Mining Amitabha Das, Wee-Keong Ng, Yew-Kwong Woon, Proc. of the 10th ACM International Conference on Information and Knowledge Management(CIKM’01),2001.
Privacy-Preserving Clustering
Sequential Pattern Mining Using A Bitmap Representation
Byung Joon Park, Sung Hee Kim
Association Rules.
CARPENTER Find Closed Patterns in Long Biological Datasets
Privacy Preserving Data Mining
Farzaneh Mirzazadeh Fall 2007
Presented by : SaiVenkatanikhil Nimmagadda
Closed Itemset Mining CSCI-7173: Computational Complexity & Algorithms, Final Project - Spring 16 Supervised By Dr. Tom Altman Presented By Shahab Helmi.
An integer programming approach for frequent itemset hiding
Presentation transcript:

An Experimental Study of Association Rule Hiding Techniques Emmanuel Pontikakis* Dept. of Computer Engineering and Informatics University of Patras Patra, Greece Vassilios Verykios* Dept. of Computer and Communication Engineering University of Thessaly Volos, Greece *Computer Technology Institute Research Unit 3 Athens, Greece

Outline  Introduction - Related Work  Distortion-based Techniques  Blocking-based Techniques  Comparison and Analysis  Conclusions

Introduction Database User Data Mining Association Rules Changed Database Hide Sensitive Rules

Related Work  Association Rule Hiding Blocking-based Technique (Saygin, Verykios, Clifton) Distortion-based (Sanitization) Technique – (Oliveira, Zaiane, Verykios, Dasseni)

Outline  Introduction - Related Work  Distortion-based Techniques  Blocking-based Techniques  Comparison and Analysis  Conclusion

Distortion-based Techniques ABCD Rule A →C has: Support(A→C)=80% Confidence(A→C)=100% Sample Database ABCD Distorted Database Rule A →C has now: Support(A→C)=40% Confidence(A→C)=50% Distortion Algorithm

Side Effects Before Hiding Process After Hiding Process Side Effect Rule R i has had conf(R i )>MCT conf(R i )<MCT Rule R i has now conf(R i )<MCT Rule Eliminated (Undesirable Side Effect) Rule R i has had conf(R i )<MCT conf(R i )>MCT Rule R i has now conf(R i )>MCT Ghost Rule (Undesirable Side Effect) sup(I)>MST Large Itemset I has had sup(I)>MST sup(I)<MST Itemset I has now sup(I)<MST Itemset Eliminated (Undesirable Side Effect)

Distortion-based Techniques  Challenges/Goals: To minimize the undesirable Side Effects that the hiding process causes to non-sensitive rules. 1’s To minimize the number of 1’s that must be deleted in the database. Algorithms must be linear in time as the database increases in size.

Our Proposal: Weight-based Sorting Distortion Algorithm (WSDA)  High Level Description: Input:  Initial Database  Set of Sensitive Rules  Safety Margin (for example 10%) Output:  Sanitized Database  Sensitive Rules no longer hold in the Database

WSDA Algorithm  High Level Description: 1 st step: R S  Retrieve the set of transactions which support sensitive rule R S R S N 1  For each sensitive rule R S find the number N 1 of transaction in which, one item that supports the rule will be deleted

WSDA Algorithm  High Level Description: 2 nd step: R i R S w R i  For each rule R i in the Database with common items with R S compute a weight w that denotes how strong is R i R S P i  For each transaction that supports R S compute a priority P i, that denotes how many strong rules this transaction supports

WSDA Algorithm  High Level Description: 3 rd step: N 1 P i  Sort the N 1 transactions in ascending order according to their priority value P i 4 th step: N 1 R S  For the first N 1 transactions hide an item that is contained in R S

WSDA Algorithm  High Level Description: 5 th step:  Update confidence and support values for other rules in the database

Experimental Results of WSDA algorithm Itemsets Remained unaffected in the Database Rules Changed In the Database

Experimental Results of WSDA algorithm Average number of items per transaction: 13/50 Average number of items per transaction: 20/50

Outline  Introduction - Related Work  Distortion-based Techniques  Blocking-based Techniques  Comparison and Analysis  Conclusion

Quality of Data  Sometimes it is dangerous to delete some items from the database (etc. medical databases) because the false data may create undesirable effects.  So, we have to hide the rules in the database by adding uncertainty without distorting the database.

Blocking-based Techniques ABCD ABCD ?1 ? Blocking Algorithm Initial Database New Database Support and Confidence becomes marginal. In New Database: 60% ≤ conf(A → C) ≤ 100%

Modification of Association Rule Definition →  A rule’s A → B confidence and support becomes marginal: A → B)[minsup(A → B), maxsup(A → B)] sup(A → B) [minsup(A → B), maxsup(A → B)] conf(A → B) [minconf(A → B), maxconf(A → B)] →  minsup(A → B)= →  maxsup(A → B)=

Modification of Association Rule Definition  minconf(A → B)=  maxconf(A → B)=

Negative Border Rules Set (NBRS) Definition  When a rule R has either sup(R)>MSTconf(R) MST AND conf(R)<MCT OR sup(R) MCT sup(R) MCT, then we say that R belongs to NBRS.

Side Effects Definition Modification in Blocking-based Techniques Before Hiding Process After Hiding Process Side Effect Rule R i has had conf(R i )>MCT minconf(R i )<MCT Rule R i has now minconf(R i )<MCT Rule Eliminated (Undesirable Side Effect) Rule R i has had conf(R i )<MCT maxconf(R i )>MCT Rule R i has now maxconf(R i )>MCT Ghost Rule (Desirable Side Effect) sup(I)>MST Large Itemset I has had sup(I)>MST minsup(I)<MST Itemset I has now minsup(I)<MST Itemset Eliminated (Undesirable Side Effect) Itemset I has hadsup(I)<MST maxsup(I)>MST Itemset I has now maxsup(I)>MST Ghost Itemset (Desirable Side Effect)

Privacy Breaches Definitions i?’s c% confidence  If an item i, some values of which, are hidden by ?’s, is contained in a sensitive rule, a privacy breach will occur if the adversary can assume that with c% confidence. Rmaxconf(R)>MCT c% confidenceR ghost rule  For a rule R with maxconf(R)>MCT, a privacy breach occurs if it can be estimated, with c% confidence, that R is either a sensitive or a ghost rule. i T c% confidence  For a blocked item i in a specific transaction T, a privacy breach occurs if the adversary can estimate with c% confidence that its original value is either 0 or 1.

Blocking-Based Techniques  Goals that an algorithm has to achieve:  To put a relatively small number of ?’s and reduce significantly the confidence of senstitive rules.  To minimize the undesirable side effects (rules and itemsets lost) by selecting the items in the appropriate transactions to change, and maximize the desirable side effects.  To modify the database in a way that an adversary cannot recover the original values of the database.

Our Proposal: Blocking Algorithm (BA)  High Level Description 1 st step: R S R S I L I R R S.  For each sensitive rule R S (Rule R S has left itemset I L and right itemset I R ) compute how many 0’s and 1’s you have to block, in order to reduce the confidence of R S. 2 nd step: T R R S T LpR’ R S  Find the set of transactions T R that support R S or the set of transactions T LpR’ that support partially R S (support partially the left itemset and do not support the right itemset). T R R common I R T LpR’ R’ common ∈ NBRS  For each transaction in T R find the rules R common with at least one common item with I R and for each transaction in T LpR’ find the R’ common ∈ NBRS with at least one common item with IL. wRcommonw’ R’common.  Assign a weight w for each Rcommon and a weight w’ for each R’common. P T T P T Ti Rcommon w, P T’ Ti’P T ’ T Rcommon w’.  Assign a P T for each transaction in T such as P T is large if transaction Ti has many Rcommon rules with large w, and a priority value P T’ for each Ti’ such as P T ’ is small if transaction T has many Rcommon rules with large w’.

Blocking Algorithm  High Level Description 3 rd step: T ∈ T R P Ti T’ ∈ T L’Rp P Ti’  Sort T ∈ T R starting from them with lowest P Ti. and sort T’ ∈ T L’Rp starting from them with highest P Ti’. 4 th step: N 1 T ∈ T R i ∈ I R N 0 T ∈ T L’Rp i ∈ I L  For the first N 1 sorted T ∈ T R block an item i ∈ I R and for the first N 0 sorted T ∈ T L’Rp block an item i ∈ I L 5 th step: minconf(Ri)minsup(Ri)  Update values minconf(Ri), minsup(Ri), for all other rules that have been affected.

Blocking-Based Techniques  Main Problems of blocking technique: 1.The maximum confidence of a sensitive rule cannot be reduced. 2.An adversary can infer the hidden values if he applies a smart inference technique, if the blocking algorithm does not add much uncertainty in the database. 3.Both 0’s and 1’s must be hidden, because if only 1’s were hidden the adversary would simply replace all the ?’s with 1’s and would restore easily the initial database. 4.Many ?’s must be inserted, if we don’t want an adversary to infer hidden data.

Experimental Results of Blocking Algorithm Large Itemsets Remained after The hiding process Rules changed (%) after the process

Experimental Results of Blocking Algorithm (2) Databases with average 20 items per transaction Databases with average 13 items per transaction

Experimental Results of Blocking Algorithm (3) Rules changed, when we Change the proportion 0:1 Decision Tree Experiments Misclassified Items (%)

Outline  Introduction - Related Work  Distortion-based Techniques  Blocking-based Techniques  Comparison and Analysis  Conclusions

Comparison and Analysis Distortion-based Techniques Blocking-based Techniques Privacy Breaches No privacy breaches Many kinds of privacy breaches Simplicity of algorithms SimplerMore complicated Database Modification Database contains false information Many ?’s must be inserted in the Database

Outline  Introduction - Related Work  Distortion-based Techniques  Blocking-based Techniques  Comparison and Analysis  Conclusions

Conclusions  There are open research problems in Blocking Technique: A) What techniques must be used in order to reduce the privacy breaches? B) In what other ways can we prevent an adversary from inferring the association rules in the database? C) Maybe applying a chi-square test to the final database reveal some correlations between the items

References Privacy Preserving Mining of Association Rules.  [Evfimienski et.al] Alexandre Evfimievski, Ramakrishnan Srikant, Rakesh Agrawal, Johannes Gehrke. Privacy Preserving Mining of Association Rules. SIGKDD 2002, Edmonton, Alberta Canada. Privacy Preserving Distributed Mining of Association Rules on Horizontally Partitioned Data  Murat Kantarcioglou and Chris Clifton, Privacy Preserving Distributed Mining of Association Rules on Horizontally Partitioned Data, In Proceedings of the ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (2002), 24–31. Privacy Preserving Association Rule Mining in Vertically Partitioned Data  Jaideep Vaidya and Chris Clifton, Privacy Preserving Association Rule Mining in Vertically Partitioned Data, In the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2002), 639–644.

References Algorithms for Balacing Privacy and Knowledge Discovery in Association Rule Mining  Stanley R. M. Oliveira and Osmar R. Zaïane. Algorithms for Balacing Privacy and Knowledge Discovery in Association Rule Mining. In Proc. of the Seventh International Database Engineering & Applications Symposium (IDEAS'03), pp , Hong Kong, July , Using Unknowns to Prevent Discovery of Association Rules  Yucel Saygin, Vassilios Verykios, and Chris Clifton, Using Unknowns to Prevent Discovery of Association Rules, SIGMOD Record 30 (2001), no. 4, 45–54. Association Rule Hiding  S. Verykios, Ahmed K. Elmagarmid, Bertino Elisa, Yucel Saygin, and Dasseni Elena, Association Rule Hiding, IEEE Transactions on Knowledge and Data Engineering (2003).