Privacy Preserving K-means Clustering on Vertically Partitioned Data Presented by: Jaideep Vaidya Joint work: Prof. Chris Clifton.

Slides:



Advertisements
Similar presentations
Data Mining: Potentials and Challenges Rakesh Agrawal & Jeff Ullman.
Advertisements

Efficient Private Approximation Protocols Piotr Indyk David Woodruff Work in progress.
Clustering k-mean clustering Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
PARTITIONAL CLUSTERING
ITIS 6200/ Secure multiparty computation – Alice has x, Bob has y, we want to calculate f(x, y) without disclosing the values – We can only do.
Rational Oblivious Transfer KARTIK NAYAK, XIONG FAN.
Computer Science Dr. Peng NingCSC 774 Adv. Net. Security1 CSC 774 Advanced Network Security Topic 5 Group Key Management.
Li Xiong CS573 Data Privacy and Security Privacy Preserving Data Mining – Secure multiparty computation and random response techniques.
Privacy Preserving Association Rule Mining in Vertically Partitioned Data Reporter : Ximeng Liu Supervisor: Rongxing Lu School of EEE, NTU
Information Security for Sensors Overwhelming Random Sequences and Permutations Shlomi Dolev, Niv Gilboa, Marina Kopeetsky, Giuseppe Persiano, and Paul.
CSCE 715 Ankur Jain 11/16/2010. Introduction Design Goals Framework SDT Protocol Achievements of Goals Overhead of SDT Conclusion.
Privacy-Preserving Data Mining
An architecture for Privacy Preserving Mining of Client Information Jaideep Vaidya Purdue University This is joint work with Murat.
Private Analysis of Data Sets Benny Pinkas HP Labs, Princeton.
1 Introduction to Secure Computation Benny Pinkas HP Labs, Princeton.
Clustering. 2 Outline  Introduction  K-means clustering  Hierarchical clustering: COBWEB.
Privacy Preserving Data Mining Yehuda Lindell & Benny Pinkas.
Privacy Preserving Data Mining: An Overview and Examination of Euclidean Distance Preserving Data Transformation Chris Giannella cgiannel AT acm DOT org.
Parallel K-Means Clustering Based on MapReduce The Key Laboratory of Intelligent Information Processing, Chinese Academy of Sciences Weizhong Zhao, Huifang.
K-means Clustering. What is clustering? Why would we want to cluster? How would you determine clusters? How can you do this efficiently?
CSE 634 Data Mining Techniques Association Rules Hiding (Not Mining) Prateek Duble ( ) Course Instructor: Prof. Anita Wasilewska State University.
Privacy Preserving Learning of Decision Trees Benny Pinkas HP Labs Joint work with Yehuda Lindell (done while at the Weizmann Institute)
Public Key Encryption that Allows PIR Queries Dan Boneh 、 Eyal Kushilevitz 、 Rafail Ostrovsky and William E. Skeith Crypto 2007.
Slide 1 Justin Brickell Donald E. Porter Vitaly Shmatikov Emmett Witchel The University of Texas at Austin Secure Remote Diagnostics.
Evaluating Performance for Data Mining Techniques
CS573 Data Privacy and Security
How to play ANY mental game
New Protocols for Remote File Synchronization Based on Erasure Codes Utku Irmak Svilen Mihaylov Torsten Suel Polytechnic University.
Privacy Preserving Query Processing in Cloud Computing Wen Jie
Efficient and Robust Private Set Intersection and multiparty multivariate polynomials Dana Dachman-Soled 1, Tal Malkin 1, Mariana Raykova 1, Moti Yung.
Overview of Privacy Preserving Techniques.  This is a high-level summary of the state-of-the-art privacy preserving techniques and research areas  Focus.
Data mining and machine learning A brief introduction.
Failure Resilience in the Peer-to-Peer-System OceanStore Speaker: Corinna Richter.
Secure Cloud Database using Multiparty Computation.
Secure Incremental Maintenance of Distributed Association Rules.
CSIE Dept., National Taiwan Univ., Taiwan
Tools for Privacy Preserving Distributed Data Mining
Cryptographic methods for privacy aware computing: applications.
Mining Multiple Private Databases Topk Queries Across Multiple Private Databases (2005) Li Xiong (Emory University) Subramanyam Chitti (GA Tech) Ling Liu.
Apache Mahout. Mahout Introduction Machine Learning Clustering K-means Canopy Clustering Fuzzy K-Means Conclusion.
1 Privacy Preserving Data Mining Haiqin Yang Extracted from a ppt “Secure Multiparty Computation and Privacy” Added “Privacy Preserving SVM”
Background on security
Privacy Preserving Data Mining Yehuda Lindell Benny Pinkas Presenter: Justin Brickell.
Privacy-Preserving Credit Checking Keith Frikken, Mikhail Atallah, and Chen Zhang Purdue University June 7, 2005.
Clustering What is clustering? Also called “unsupervised learning”Also called “unsupervised learning”
1 Secure Multi-party Computation Minimizing Online Rounds Seung Geol Choi Columbia University Joint work with Ariel Elbaz(Columbia University) Tal Malkin(Columbia.
Mining Multiple Private Databases Topk Queries Across Multiple Private Databases (2005) Mining Multiple Private Databases Using a kNN Classifier (2007)
Privacy-preserving rule mining. Outline  A brief introduction to association rule mining  Privacy preserving rule mining Single party  Perturbation.
Privacy-Preserving K-means Clustering over Vertically Partitioned Data Reporter : Ximeng Liu Supervisor: Rongxing Lu School of EEE, NTU
Information Sharing across Private Databases Rakesh Agrawal Alexandre Evfimievski Ramakrishnan Srikant IBM Almaden Research Center.
Secure Query Processing in an Untrusted (Cloud) Environment.
Gillian Raab, Chris Dibben, & Paul Burton UNECE-Eurostat Work Session on Statistical Data Confidentiality, Helsinki, 2015 Running an analysis of combined.
Clustering Clustering is a technique for finding similarity groups in data, called clusters. I.e., it groups data instances that are similar to (near)
Privacy-Preserving Self- Organizing Map Shuguo Han and Wee Keong Ng Center for Advanced Information Systems, School of Computer Engineering,Nanyang Technological.
Machine Learning Queens College Lecture 7: Clustering.
Strong Conditional Oblivious Transfer and Computing on Intervals Vladimir Kolesnikov Joint work with Ian F. Blake University of Toronto.
1 Privacy Preserving Data Mining Introduction August 2 nd, 2013 Shaibal Chakrabarty.
A new clustering tool of Data Mining RAPID MINER.
Clustering (1) Chapter 7. Outline Introduction Clustering Strategies The Curse of Dimensionality Hierarchical k-means.
Given a set of data points as input Randomly assign each point to one of the k clusters Repeat until convergence – Calculate model of each of the k clusters.
Second Price Auctions A Case Study of Secure Distributed Computing Bart De Decker Gregory Neven Frank Piessens Erik Van Hoeymissen.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Chun Kai Chen Author : Andrew.
Privacy Preserving Outlier Detection using Locality Sensitive Hashing
Privacy-Preserving Data Aggregation without Secure Channel: Multivariate Polynomial Evaluation Taeho Jung 1, XuFei Mao 2, Xiang-Yang Li 1, Shao-Jie Tang.
IIIT Hyderabad Private Outlier Detection and Content based Encrypted Search Nisarg Raval MS by Research, CSE Advisors : Prof. C. V. Jawahar & Dr. Kannan.
Data Mining And Privacy Protection Prepared by: Eng. Hiba Ramadan Supervised by: Dr. Rakan Razouk.
Privacy-Preserving Clustering
Clustering Uncertain Taxi data
CS573 Data Privacy and Security
Presentation transcript:

Privacy Preserving K-means Clustering on Vertically Partitioned Data Presented by: Jaideep Vaidya Joint work: Prof. Chris Clifton

Overview Global Problem –Privacy Preserving Distributed Data Mining Specific Problem –Clustering (K-Means) For –Vertically Partitioned Data Using –Cryptographic Tools

Clustering Grouping similar objects/instances into clusters Issues Data is often distributed Privacy/Security Concerns Individual Privacy Entity Privacy Scalability Outline Vertical Data Partitioning Motivation Brief Introduction to PPDM / SMC K-Means Algorithm Privacy Preserving K- Means Algorithm Communication Cost Conclusions Security Proofs (Disclaimer!)

Outline Vertical Partitioning of Data Motivation Brief Introduction to PPDM / SMC K-Means Algorithm Privacy Preserving K-Means Algorithm –Closest Cluster Computation –When to stop Communication Cost Conclusions Security Proofs (Disclaimer!)

Medical Records RPJBrain TumorDiabetic CACNo TumorNon-Diabetic PTRNo TumorDiabetic Cell Phone Data RPJ5210Li/Ion CACnone PTR3650NiCd Global Database View TIDBrain Tumor?Diabetes?ModelBattery

Medical Records RPJYesDiabetic CACNo TumorNo PTRNo TumorDiabetic Cell Phone Data RPJ5210Li/Ion CACnone PTR3650NiCd Global Database View TIDBrain Tumor?Diabetes?ModelBattery Vertical Partitioning of Data

Is the problem trivial?

Privacy Preserving Data Mining Perturbation –Agrawal & Srikant, Agrawal & Aggarwal, –Rizvi & Haritsa, Evfimievski et al. Cryptographic –Lindell & Pinkas, Du & Zhan –Vaidya & Clifton, Kantarcioglu & Clifton

Secure Multiparty Computation (SMC) Given a function f and n inputs, distributed at n sites, compute the result while revealing nothing to any site except its own input(s) and the result.

Results Cluster assignment for entities –Not private Cluster centers –Semi-private Li/IonPiezo

Secure K-means clustering Arbitrarily select k starting points Repeat –Assign to respectively –(re)assign each object to closest cluster based on distance from mean –Re-compute the cluster means Until no change K-means clustering

Assigning objects to closest cluster

Key Idea Disguise site components with random values Compare distances while revealing only comparison result Permute order of clusters to conceal meaning of comparison results

Closest Cluster Computation 3 special sites, P 1, P 2 and P r P 1 generates –r random vectors such that –Permutation π (over 1.. K)

Permutation Protocol Du and Atallah ’01 A B Homomorphic encryption: E k (x)*E k (y) = E k (x+y)

Closest Cluster Computation P1P1 P2P2 PrPr Stage 1 P1P1 P r-1 P3P3 PrPr Stage 2

Closest Cluster Computation Stage 3 –P 2 and P r determine i, the index of the cluster with minimum distance Stage 4 –P 1 computes and broadcasts

When to stop? Locally compute difference in means Globally known threshold Use simple random-adding technique to disguise actual values –First party adds random value to its distance and sends to next party –Each party adds its value to total and sends on –Last party compares with first party’s random +threshold

Communication Cost r parties, n data elements, m bit distances Basic algorithm – O(knr) bits, O(r+k) rounds Optimized Version – O(kmr) bits, O(r) rounds Generic Method – O(kmr 3 ), 1 round Non-secure Method – O(n) bits, 1 round

Communication Cost r parties, n data elements, m bit distances BitsRounds Basic Algorithm O(knr)O(r+k) Optimized Algorithm O(kmr)O(r) Generic Method O(kmnr 3 )1 Non-Secure Method O(n)1

Conclusion Presented a solution for Privacy Preserving K-Means Clustering problem How to use clusters? Will parties share required information for the possible benefits? Improve Efficiency Working on EM-Clustering, implementations