Privacy Preserving Data Mining Yehuda Lindell & Benny Pinkas.

Slides:



Advertisements
Similar presentations
Revisiting the efficiency of malicious two party computation David Woodruff MIT.
Advertisements

Hunt’s Algorithm CIT365: Data Mining & Data Warehousing Bajuna Salehe
Decision Tree Approach in Data Mining
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
ITIS 6200/ Secure multiparty computation – Alice has x, Bob has y, we want to calculate f(x, y) without disclosing the values – We can only do.
Semi-Honest to Malicious Oblivious-Transfer The Black-box Way Iftach Haitner Weizmann Institute of Science.
Rational Oblivious Transfer KARTIK NAYAK, XIONG FAN.
CS555Topic 241 Cryptography CS 555 Topic 24: Secure Function Evaluation.
Amortizing Garbled Circuits Yan Huang, Jonathan Katz, Alex Malozemoff (UMD) Vlad Kolesnikov (Bell Labs) Ranjit Kumaresan (Technion) Cut-and-Choose Yao-Based.
Introduction to Modern Cryptography, Lecture 12 Secure Multi-Party Computation.
General Cryptographic Protocols (aka secure multi-party computation) Oded Goldreich Weizmann Institute of Science.
Yan Huang, Jonathan Katz, David Evans University of Maryland, University of Virginia Efficient Secure Two-Party Computation Using Symmetric Cut-and-Choose.
Models and Security Requirements for IDS. Overview The system and attack model Security requirements for IDS –Sensitivity –Detection Analysis methodology.
ID3 Algorithm Abbas Rizvi CS157 B Spring What is the ID3 algorithm? ID3 stands for Iterative Dichotomiser 3 Algorithm used to generate a decision.
An architecture for Privacy Preserving Mining of Client Information Jaideep Vaidya Purdue University This is joint work with Murat.
Finding Personally Identifying Information Mark Shaneck CSCI 5707 May 6, 2004.
Privacy Preserving K-means Clustering on Vertically Partitioned Data Presented by: Jaideep Vaidya Joint work: Prof. Chris Clifton.
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
MAE 552 – Heuristic Optimization Lecture 26 April 1, 2002 Topic:Branch and Bound.
Private Analysis of Data Sets Benny Pinkas HP Labs, Princeton.
1 Introduction to Secure Computation Benny Pinkas HP Labs, Princeton.
1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.
Classification and Prediction by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
Privacy Preserving Learning of Decision Trees Benny Pinkas HP Labs Joint work with Yehuda Lindell (done while at the Weizmann Institute)
Ordinal Decision Trees Qinghua Hu Harbin Institute of Technology
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
Evaluating Performance for Data Mining Techniques
CS573 Data Privacy and Security
Secure Computation of the k’th Ranked Element Gagan Aggarwal Stanford University Joint work with Nina Mishra and Benny Pinkas, HP Labs.
Secure Cloud Database using Multiparty Computation.
Secure Incremental Maintenance of Distributed Association Rules.
Insert presenter logo here on slide master. See hidden slide 4 for directions  Session ID: Session Classification: SEUNG GEOL CHOI UNIVERSITY OF MARYLAND.
Decision Trees & the Iterative Dichotomiser 3 (ID3) Algorithm David Ramos CS 157B, Section 1 May 4, 2006.
Join Synopses for Approximate Query Answering Swarup Achrya Philip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented by Bhushan Pachpande.
GATree: Genetically Evolved Decision Trees 전자전기컴퓨터공학과 데이터베이스 연구실 G 김태종.
Cryptographic methods for privacy aware computing: applications.
Mining Multiple Private Databases Topk Queries Across Multiple Private Databases (2005) Li Xiong (Emory University) Subramanyam Chitti (GA Tech) Ling Liu.
Secure two-party computation: a visual way by Paolo D’Arco and Roberto De Prisco.
Swarup Acharya Phillip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented By Vinay Hoskere.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Privacy Preserving Data Mining Yehuda Lindell Benny Pinkas Presenter: Justin Brickell.
Privacy-Preserving Credit Checking Keith Frikken, Mikhail Atallah, and Chen Zhang Purdue University June 7, 2005.
On the Communication Complexity of SFE with Long Output Daniel Wichs (Northeastern) joint work with Pavel Hubáček.
Additive Data Perturbation: the Basic Problem and Techniques.
Mining Multiple Private Databases Topk Queries Across Multiple Private Databases (2005) Mining Multiple Private Databases Using a kNN Classifier (2007)
CS690L Data Mining: Classification
Slides for “Data Mining” by I. H. Witten and E. Frank.
Page 1March 1, th Estonian Winter School in Computer Science Privacy Preserving Data Mining Lecture 2 Cryptographic Solutions Benny Pinkas HP Labs,
Strong Conditional Oblivious Transfer and Computing on Intervals Vladimir Kolesnikov Joint work with Ian F. Blake University of Toronto.
DECISION TREE Ge Song. Introduction ■ Decision Tree: is a supervised learning algorithm used for classification or regression. ■ Decision Tree Graph:
Efficient Private Matching and Set Intersection Mike Freedman, NYU Kobbi Nissim, MSR Benny Pinkas, HP Labs EUROCRYPT 2004.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
DECISION TREES Asher Moody, CS 157B. Overview  Definition  Motivation  Algorithms  ID3  Example  Entropy  Information Gain  Applications  Conclusion.
Probabilistic Approaches to Phylogenies BMI/CS 576 Sushmita Roy Oct 2 nd, 2014.
Section #9: Bitcoins. Digital currency Unique string of bits Use cryptography for security and privacy Not tied to names: hard to trace Finite set of.
Approximation Algorithms based on linear programming.
Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),
Cryptographic methods. Outline  Preliminary Assumptions Public-key encryption  Oblivious Transfer (OT)  Random share based methods  Homomorphic Encryption.
Secret Sharing Schemes: A Short Survey Secret Sharing 2.
Linear, Nonlinear, and Weakly-Private Secret Sharing Schemes
DECISION TREE INDUCTION CLASSIFICATION AND PREDICTION What is classification? what is prediction? Issues for classification and prediction. What is decision.
Multi-Party Computation r n parties: P 1,…,P n  P i has input s i  Parties want to compute f(s 1,…,s n ) together  P i doesn’t want any information.
© 2013 ExcelR Solutions. All Rights Reserved Data Mining - Supervised Decision Tree & Random Forest.
CS573 Data Privacy and Security
Privacy Preserving Data Mining
DATA MINING Introductory and Advanced Topics Part II - Clustering
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
Decision Trees Jeff Storey.
Presentation transcript:

Privacy Preserving Data Mining Yehuda Lindell & Benny Pinkas

Summary Objective Various components / tools needed Algorithm

Objective Perform Data-mining on union of two private databases Data stays private i.e. no party learns anything but output

Assumptions Large Databases – Generic Solutions not possible Semi-Honest Parties

Classification by Decision Tree Learning Transaction Attributes Class Attribute Want to Predict Class, using only non-class attributes

Decision Tree Rooted tree with nodes/edges Internal Nodes => Attributes Edges leaving nodes => Possible values Leaves => Expected Class for transaction – Traverse tree using known attributes – Predict class given leaf node’s value

Constructing Tree Top-down At each level – find attribute that “best” classifies transactions => gives least overhead – Best => Attribute that minimizes entropy (maximizes information gain) – Entropy = -xlnx – Entropy of class = 0

Entropy calcluations Entropy – H(T) = sum (-x ln x ) – Hc(T) => Info needed to ID class of transaction T X = set of transactions for each class Sum over all possible classes – Hc(T | A) => Info needed to ID class of transaction T, Given value v of attribute A X = transactions with value = v for attribute A – Gain = Hc(T) – Hc(T | A)

Private Computation Given only x1 and f1(x1,y), function S1 exists s.t.: – P2 provides input x1 to P1 – P2 can compute corresponding view of P1’s DB (desired pairs) S1 f1 Party 2 Party 1 x1 f1(x1,y) View

Oblivious Evaluation What if in previous example: Party 2 does not want Party 1 to know what input (x1) it is providing? Oblivious Evaluation: Receiver obtains P(x) without learning anything else about polynomial P. Sender learns nothing about x.

Oblivious Evaluation (2) – Simplified Version ri = receiver’s random number Ri = sender’s random number X = input from rcvr SenderReceiver s (secret key) (a ri, a s*rj a x ) (a Ri, a s*R a P(x) a sri ) Divide 2 nd element by 1 st element raised to power s to get P(x) a P(x) = (a Ri, a s*R a P(x) a sri ) / (a Ri * a ri ) s

Algorithm Step 1 - Each party computes ID3 – decision tree learning – (O(# attributes)) Step 2 - Combine results using cryptographic protocols like oblivious evaluation - (O(log(#transactions))) Result - Each party gains results of data- mining without learning more than necessary

Algorithm (2) Finding “best” attribute is hardest part Each party computes their “share” of entropy – For each attribute, combine values from each party – Results in private computation of Entropy (-xlnx) Choose attribute that minimizes entropy – Provides maximum information gain – Ensures most efficient tree with least overhead – Use oblivious Evaluation

Discussion of Algorithm Efficient: – Large Databases accommodated: Algorithm relies on number of possible values for attributes – NOT number of transactions in database Private: – Each step depends on local computation and private protocol – Uses techniques like oblivious transfer / evaluation to exchange information – Paper proves individual steps are private, AND can predict control flow between steps ONLY based on input/output – so also private

Discussion of Algorithm (2) Approximate ID3 used instead of actual ID3 – shown to be as secure and provide same information