An architecture for Privacy Preserving Mining of Client Information Jaideep Vaidya Purdue University This is joint work with Murat.

Slides:



Advertisements
Similar presentations
Data Mining: Potentials and Challenges Rakesh Agrawal & Jeff Ullman.
Advertisements

The Average Case Complexity of Counting Distinct Elements David Woodruff IBM Almaden.
Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
Secure Multiparty Computations on Bitcoin
Privacy-Preserving Databases and Data Mining Yücel SAYGIN
Fast Algorithms For Hierarchical Range Histogram Constructions
Decision Tree Induction in Hierarchic Distributed Systems With: Amir Bar-Or, Ran Wolff, Daniel Keren.
PRIVACY AND SECURITY ISSUES IN DATA MINING P.h.D. Candidate: Anna Monreale Supervisors Prof. Dino Pedreschi Dott.ssa Fosca Giannotti University of Pisa.
Efficiency concerns in Privacy Preserving methods Optimization of MASK Shipra Agrawal.
Privacy Preserving Association Rule Mining in Vertically Partitioned Data Reporter : Ximeng Liu Supervisor: Rongxing Lu School of EEE, NTU
A Generic Framework for Handling Uncertain Data with Local Correlations Xiang Lian and Lei Chen Department of Computer Science and Engineering The Hong.
Rakesh Agrawal Ramakrishnan Srikant
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Finding Personally Identifying Information Mark Shaneck CSCI 5707 May 6, 2004.
SAC’06 April 23-27, 2006, Dijon, France On the Use of Spectral Filtering for Privacy Preserving Data Mining Songtao Guo UNC Charlotte Xintao Wu UNC Charlotte.
Maintenance of Discovered Association Rules S.D.LeeDavid W.Cheung Presentation : Pablo Gazmuri.
Privacy Preserving K-means Clustering on Vertically Partitioned Data Presented by: Jaideep Vaidya Joint work: Prof. Chris Clifton.
Fast Algorithms for Association Rule Mining
Privacy Preserving Data Mining Yehuda Lindell & Benny Pinkas.
Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules S.D. Lee David W. Cheung Ben Kao The University of Hong Kong.
Privacy Preserving OLAP Rakesh Agrawal, IBM Almaden Ramakrishnan Srikant, IBM Almaden Dilys Thomas, Stanford University.
Privacy Preserving Learning of Decision Trees Benny Pinkas HP Labs Joint work with Yehuda Lindell (done while at the Weizmann Institute)
Privacy-Preserving Data Mining Rakesh Agrawal Ramakrishnan Srikant IBM Almaden Research Center 650 Harry Road, San Jose, CA Published in: ACM SIGMOD.
Performance and Scalability: Apriori Implementation.
Hippocratic Databases Paper by Rakesh Agrawal, Jerry Kiernan, Ramakrishnan Srikant, Yirong Xu CS 681 Presented by Xi Hua March 1st,Spring05.
Information Retrieval from Data Bases for Decisions Dr. Gábor SZŰCS, Ph.D. Assistant professor BUTE, Department Information and Knowledge Management.
『 Data Mining 』 By Jung, hae-sun. 1.Introduction 2.Definition 3.Data Mining Applications 4.Data Mining Tasks 5. Overview of the System 6. Data Mining.
CS573 Data Privacy and Security
A Secure Protocol for Computing Dot-products in Clustered and Distributed Environments Ioannis Ioannidis, Ananth Grama and Mikhail Atallah Purdue University.
Overview of Privacy Preserving Techniques.  This is a high-level summary of the state-of-the-art privacy preserving techniques and research areas  Focus.
1 Privacy-Preserving Distributed Information Sharing Nan Zhang and Wei Zhao Texas A&M University, USA.
Mining Optimal Decision Trees from Itemset Lattices Dr, Siegfried Nijssen Dr. Elisa Fromont KDD 2007.
1 Apriori Algorithm Review for Finals. SE 157B, Spring Semester 2007 Professor Lee By Gaurang Negandhi.
Secure Incremental Maintenance of Distributed Association Rules.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Privacy-Aware Personalization for Mobile Advertising
Tools for Privacy Preserving Distributed Data Mining
Disclosure risk when responding to queries with deterministic guarantees Krish Muralidhar University of Kentucky Rathindra Sarathy Oklahoma State University.
1 Privacy Preserving Data Mining Haiqin Yang Extracted from a ppt “Secure Multiparty Computation and Privacy” Added “Privacy Preserving SVM”
Privacy Preserving Data Mining Yehuda Lindell Benny Pinkas Presenter: Justin Brickell.
Privacy Preserving Mining of Association Rules Alexandre Evfimievski, Ramakrishnan Srikant, Rakesh Agrawal, Johannes Gehrke IBM Almaden Research Center.
Privacy preserving data mining Li Xiong CS573 Data Privacy and Anonymity.
Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang.
SECURED OUTSOURCING OF FREQUENT ITEMSET MINING Hana Chih-Hua Tai Dept. of CSIE, National Taipei University.
Randomization in Privacy Preserving Data Mining Agrawal, R., and Srikant, R. Privacy-Preserving Data Mining, ACM SIGMOD’00 the following slides include.
Privacy-preserving rule mining. Outline  A brief introduction to association rule mining  Privacy preserving rule mining Single party  Perturbation.
Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules S.D. Lee, David W. Cheung, Ben Kao The University of Hong.
1 Efficient Algorithms for Incremental Update of Frequent Sequences Minghua ZHANG Dec. 7, 2001.
Privacy vs. Utility Xintao Wu University of North Carolina at Charlotte Nov 10, 2008.
Privacy Preserving Payments in Credit Networks By: Moreno-Sanchez et al from Saarland University Presented By: Cody Watson Some Slides Borrowed From NDSS’15.
Security in Outsourced Association Rule Mining. Agenda  Introduction  Approximate randomized technique  Encryption  Summary and future work.
Bloom Cookies: Web Search Personalization without User Tracking Authors: Nitesh Mor, Oriana Riva, Suman Nath, and John Kubiatowicz Presented by Ben Summers.
Privacy-Preserving Location- Dependent Query Processing Mikhail J. Atallah and Keith B. Frikken Purdue University.
1 Limiting Privacy Breaches in Privacy Preserving Data Mining In Proceedings of the 22 nd ACM SIGACT – SIGMOD – SIFART Symposium on Principles of Database.
1 Privacy Preserving Data Mining Introduction August 2 nd, 2013 Shaibal Chakrabarty.
1 Systematic Data Selection to Mine Concept-Drifting Data Streams Wei Fan Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
Privacy Preserving Outlier Detection using Locality Sensitive Hashing
Improvement of Apriori Algorithm in Log mining Junghee Jaeho Information and Communications University,
Privacy-Preserving Data Aggregation without Secure Channel: Multivariate Polynomial Evaluation Taeho Jung 1, XuFei Mao 2, Xiang-Yang Li 1, Shao-Jie Tang.
1 Top Down FP-Growth for Association Rule Mining By Ke Wang.
1 Maintaining Data Privacy in Association Rule Mining Speaker: Minghua ZHANG Oct. 11, 2002 Authors: Shariq J. Rizvi Jayant R. Haritsa VLDB 2002.
Cryptographic methods. Outline  Preliminary Assumptions Public-key encryption  Oblivious Transfer (OT)  Random share based methods  Homomorphic Encryption.
Security in Outsourcing of Association Rule Mining
Privacy-Preserving Data Mining
Privacy Preserving Data Mining
DATABASE HISTOGRAMS E0 261 Jayant Haritsa
Privacy preserving cloud computing
Presentation transcript:

An architecture for Privacy Preserving Mining of Client Information Jaideep Vaidya Purdue University This is joint work with Murat Kantarcioglu

Data Mining Data Mining is the process of finding new and potentially useful knowledge from data. Typical Algorithms –Classification –Association Rules –Clustering

What is Privacy Preserving Data Mining? Term appeared in 2000: –A–Agrawal and Srikant, SIGMOD Added noise to data before delivery to the data miner Technique to reduce impact of noise learning a decision tree –L–Lindell and Pinkas, CRYPTO Two parties, each with a portion of the data Learn a decision tree without sharing data Different Concepts of Privacy!

Related Work Perturbation Approaches –Agrawal, Srikant, SIGMOD 2000 –Agrawal, Aggarwal, –Evfimievski et al, SIGKDD 2002 SMC approaches –Lindell, Pinkas, CRYPTO 2000 –Kantarcioglu, Clifton, DMKD 2002 –Vaidya, Clifton, SIGKDD 2002 Other approaches –Rizvi, Haritsa, VLDB 2002 –Du, Atallah,

Motivation PPDM Security AccuracyEfficiency Improving any one aspect typically degrades the other two

Motivating Example Assume that an attribute Y is perturbed by uniform random variable with range [-2,2]. If we see Y ’ i = Y i + r = 5, then Y i [3,7] Assume after reconstruction of the distribution( The basic assumption of all perturbation techniques is that we can reconstruct distributions), This implies Y i [4,7]

Motivating Example (Cont.) Even worse, assume that Therefore we could infer that

Motivation Perfect Privacy is achievable without compromising on Accuracy Users do not want to be permanently online (to engage in some complex protocol) Outside parties can be used as long as there are strict bounds on what information they receive and what operations they are allowed to do

Key Insight Consider using non-colluding, untrusted, semi- honest third parties to carry out computation Non-colluding –Should not collude with any of the original users or any of the other parties Untrusted –Throughout the process, should never gain access to any information (in the clear), as long as the first assumption (non-colluding) holds true Semi-honest –All parties correctly follow the protocol, but are then free to use whatever information they see during the execution of the protocols in any way

The Architecture Use three sites with the properties defined earlier: Original Site (OS) –Site that collects share of the information from all clients, and will learn the final result of the data mining process Non-Colluding Storage Site (NSS) –Used for storing shared part of user information Processing Site (PS) –Used to do data mining efficiently

Finding Frequent Itemsets OSNSS User 1 User n ID1 ID2 ID ID1 - 1 ID2 - 1 ID3 - 0 ID1 - 1 ID2 - 0 ID3 - 1 ID4 ID ID4 - 1 ID5 - 1 ID4 - 1 ID5 - 0 ID1 - 1 ID2 - 1 ID3 - 0 ID4 - 1 ID5 - 1 ID1 - 1 ID2 - 0 ID3 - 1 ID4 - 1 ID5 - 0 From this point on, the end users are not involved in the remaining protocol, and so they do not have to stay online

Interlude For our protocol, total number of transactions, n = 5 number of fake transactions to add (fraction of total), epsilon = 0.2 epsilon*n = 5*0.2 = 1 Original Site decides to make some of the fake transactions supporting the itemset, while some don’t (it knows the exact count)

ID1 - 1 ID2 - 1 ID3 - 0 ID4 - 1 ID5 - 1 ID6 - 0 ID1 - 1 ID2 - 0 ID3 - 1 ID4 - 1 ID5 - 0 ID6 - 1 Finding Frequent Itemsets OSNSS ID1 - 1 ID2 - 1 ID3 - 0 ID4 - 1 ID5 - 1 ID1 - 1 ID2 - 0 ID3 - 1 ID4 - 1 ID5 - 0 PS ID6 - 0 ID Result = 4 Result = 3

Doing Secure Data Mining Once the support count of an itemset has been calculated, the process for finding association rules securely is well known Other data mining algorithms become easily possible by modifying the process

Communication Cost: For each k-itemset at least bit must be transferred for the exact result. Let us assume that number of candidate k- itemsets is C k. Let assume we have at most m-itemset candidates. Total communication cost for the association rule mining would be

Security Analysis NSS view: The NSS only gets to see random numbers. Thus, it does not learn anything. OS view: OS learns the support count of the itemsets but does not learn which user supports any particular itemset. Essentially,

Security Analysis (Cont.) PS learns an upper bound on the support count but it does not know for which itemset. (Ordering of the attributes randomized) Because of the addition of fake items and random ordering, it has no way of correlating the itemsets to any particular user.

Security Analysis As long as the three sites (OS, NSS and PS) do not collude with each other, they do not learn anything

Benefits of the framework Perfect individual privacy is achieved Users do not have to stay online for a complicated protocol. Once they have split their information among the storage sites, they are done

Future Work An extremely efficient way of generating one-itemsets securely is possible. Using this instead of the general method, will lead to great savings in communication Sampling should be done to further lower communication cost and increase efficiency

Conclusion Privacy and Efficiency are both important for Secure Data Mining. Compromising on either is not practical A framework for privacy preserving data mining has been suggested Need to implement and evaluate true efficiency, after including improvements such as sampling