CS573 Data Privacy and Security

Slides:



Advertisements
Similar presentations
A Support Vector Method for Optimizing Average Precision
Advertisements

Mix and Match: A Simple Approach to General Secure Multiparty Computation + Markus Jakobsson Bell Laboratories Ari Juels RSA Laboratories.
Comp 122, Spring 2004 Order Statistics. order - 2 Lin / Devi Comp 122 Order Statistic i th order statistic: i th smallest element of a set of n elements.
Web Information Retrieval
16.4 Estimating the Cost of Operations Project GuidePrepared By Dr. T. Y. LinVinayan Verenkar Computer Science Dept San Jose State University.
Analysis of Algorithms
Imbalanced data David Kauchak CS 451 – Fall 2013.
ITIS 6200/ Secure multiparty computation – Alice has x, Bob has y, we want to calculate f(x, y) without disclosing the values – We can only do.
CS0007: Introduction to Computer Programming Array Algorithms.
Quick Sort, Shell Sort, Counting Sort, Radix Sort AND Bucket Sort
Li Xiong CS573 Data Privacy and Security Privacy Preserving Data Mining – Secure multiparty computation and random response techniques.
 1 Sorting. For computer, sorting is the process of ordering data. [ ]  [ ] [ “Tom”, “Michael”, “Betty” ]  [ “Betty”, “Michael”,
Algorithmic Complexity Nelson Padua-Perez Bill Pugh Department of Computer Science University of Maryland, College Park.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu.
Co-operative Private Equality Test(CPET) Ronghua Li and Chuan-Kun Wu (received June 21, 2005; revised and accepted July 4, 2005) International Journal.
Rank Aggregation. Rank Aggregation: Settings Multiple items – Web-pages, cars, apartments,…. Multiple scores for each item – By different reviewers, users,
Privacy Preserving K-means Clustering on Vertically Partitioned Data Presented by: Jaideep Vaidya Joint work: Prof. Chris Clifton.
Privacy Preserving Data Mining Yehuda Lindell & Benny Pinkas.
CSC 3323 Notes – Introduction Algorithm Analysis.
DAST 2005 Week 4 – Some Helpful Material Randomized Quick Sort & Lower bound & General remarks…
CS 580S Sensor Networks and Systems Professor Kyoung Don Kang Lecture 7 February 13, 2006.
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
Database Laboratory Regular Seminar TaeHoon Kim.
CS573 Data Privacy and Security
Overview of Privacy Preserving Techniques.  This is a high-level summary of the state-of-the-art privacy preserving techniques and research areas  Focus.
Data mining and machine learning A brief introduction.
Secure Cloud Database using Multiparty Computation.
Secure Incremental Maintenance of Distributed Association Rules.
Neural Networks Ellen Walker Hiram College. Connectionist Architectures Characterized by (Rich & Knight) –Large number of very simple neuron-like processing.
Tools for Privacy Preserving Distributed Data Mining
Cryptographic methods for privacy aware computing: applications.
Mining Multiple Private Databases Topk Queries Across Multiple Private Databases (2005) Li Xiong (Emory University) Subramanyam Chitti (GA Tech) Ling Liu.
Personalized Social Recommendations – Accurate or Private? A. Machanavajjhala (Yahoo!), with A. Korolova (Stanford), A. Das Sarma (Google) 1.
SECURED OUTSOURCING OF FREQUENT ITEMSET MINING Hana Chih-Hua Tai Dept. of CSIE, National Taipei University.
Mining Multiple Private Databases Topk Queries Across Multiple Private Databases (2005) Mining Multiple Private Databases Using a kNN Classifier (2007)
CS 61B Data Structures and Programming Methodology July 21, 2008 David Sun.
Privacy-preserving rule mining. Outline  A brief introduction to association rule mining  Privacy preserving rule mining Single party  Perturbation.
Privacy vs. Utility Xintao Wu University of North Carolina at Charlotte Nov 10, 2008.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 7.
Privacy-Preserving Data Aggregation without Secure Channel: Multivariate Polynomial Evaluation Taeho Jung 1, XuFei Mao 2, Xiang-Yang Li 1, Shao-Jie Tang.
1 Maintaining Data Privacy in Association Rule Mining Speaker: Minghua ZHANG Oct. 11, 2002 Authors: Shariq J. Rizvi Jayant R. Haritsa VLDB 2002.
Chapter 11 Sorting Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and Mount.
CSC317 Selection problem q p r Randomized‐Select(A,p,r,i)
Order Statistics Comp 122, Spring 2004.
Randomized Algorithms
Privacy-preserving Release of Statistics: Differential Privacy
CS 4/527: Artificial Intelligence
Lecture 16: Parallel Algorithms I
Algorithm design and Analysis
Selection Sort Sorted Unsorted Swap
Overview: Fault Diagnosis
Rank Aggregation.
Sidharth Mishra Dr. T.Y. Lin CS 257 Section 1 MH 222 SJSU - Fall 2016
CS573 Data Privacy and Security
Randomized Algorithms
Privacy Preserving Data Mining
Topic: Divide and Conquer
Parallel Computation Patterns (Reduction)
Distributed Probabilistic Range-Aggregate Query on Uncertain Data
Searching CLRS, Sections 9.1 – 9.3.
CS 3343: Analysis of Algorithms
Technology Mapping I based on tree covering
Chapter 9: Medians and Order Statistics
Order Statistics Comp 122, Spring 2004.
CS 583 Analysis of Algorithms
The Selection Problem.
Algorithms CSCI 235, Spring 2019 Lecture 19 Order Statistics
CS203 Lecture 15.
Presentation transcript:

CS573 Data Privacy and Security “Secure” Multiparty Computation – Multi-round protocols Li Xiong CS573 Data Privacy and Security 1

Secure multiparty computation 2018/11/8 Secure multiparty computation General circuit based secure multiparty computation methods Specialized secure multiparty computation protocols Decision tree mining across horizontally partitioned data Secure sum, secure union Association rule mining across horizontally partitioned data Most of them rely on cryptographic primitives and are still expensive Multi-round protocols as an alternative (except secure sum we have seen so far)

Multi-round protocols Max/min, top k k-th element protocol using secure comparison (Aggarwal ‘04) Multi-round probabilistic protocols (Xiong ‘07) OR (Union) Commutative encryption based Multi-round probabilistic protocols (Bawa ’03) Secure computation of the k-ranked element, Aggarwal, 2004 Preserving Privacy for outsourcing aggregation services, Xiong, 2007 Privacy Preserving Indexing of Documents on the Network, Bawa, 2003

K-th element (Aggarwal 04) Input Di, i = 1, 2, …, s k Range of data values: [alpha, beta]. Size of the union of database: n Output The k-th ranked elements in the union of Di Secure computation of the k-ranked element, Aggarwal, 2004

kth element protocol Initialize Repeat until done Each party ranks its elements in ascending order. Initialize current range [a,b] to [alpha, beta], set n= sum |Di| Repeat until done Set m = (a+b)/2 Each party computes li, number of elements less than m, and gi, number of elements greater than m If sum(li) <= k-1 and sum(gi) <= n-k, done If sum(li) >=k, set b = m-1, output 0 If sum(gi) >= n-k+1, set a = m+1, output 1

Cost Number of rounds: logM where M is the range size Each round requires two secure sums and two secure comparisons

Multi-round protocols 2018/11/8 Multi-round protocols Can we get away from cryptographic primitives? Multi-round protocols idea Use randomizations (random response) Utilize inherent network anonymity of multiple nodes Multi-round protocols May not be completely secure May not be completely accurate (except secure sum we have seen so far) 7

Multi-round protocols Multi-round probabilistic protocols for max/min and top k (Xiong ‘07) Multi-round OR (union) protocol (Aggarwal ’04)

Protocol Structure Random response (Warner 1965) 2018/11/8 Protocol Structure Social Survey Random response (Warner 1965) Multi-round randomized protocol Randomized local computation Multi-node anonymity Assumption: semi- honest model Private Data D1 D3 D2 Dn … Output Input Local Computation Preserving Privacy for outsourcing aggregation services, Xiong, 2007

A Naïve Max/Min Protocol 2018/11/8 A Naïve Max/Min Protocol 1 3 2 4 30 20 40 10 start i gi-1 gi vi gi-1>=vi gi-1<vi gi gi-1 vi Privacy Characteristics Starting node has 100% data value exposure Nodes close to the starting node has a high probability of data value exposure Data range exposure An anonymous protocol can be used to avoid worst case but will not help the average loss of privacy Intuitive idea Add in randomization When, how, and how much randomization to add in? Goal Ensure correct result Minimize data disclosure Add in randomization – how, when, and how much?

Max Protocol – Random response 2018/11/8 Max Protocol – Random response Random response at node i: gi-1>=vi gi-1(r)<vi gi(r) gi-1(r) w/ prob Pr: random number w/ prob 1-Pr: vi i gi-1 gi vi

Max Protocol – multi-round random response 2018/11/8 Max Protocol – multi-round random response Multiple rounds Randomization Probability at round r : Pr(r) = Local algorithm at round r and node i: gi-1(r)>=vi gi-1(r)<vi gi(r) gi-1(r) w/ prob Pr: rand [gi-1(r), vi) w/ prob 1-Pr: vi i gi-1(r) gi(r) vi

Max Protocol - Illustration 2018/11/8 Max Protocol - Illustration Start 18 32 35 D2 D2 30 10 32 35 40 18 32 35 20 40 D4 D3 32 35 40

PrivateTopK Protocol Gi-1(r) Vi Gi(r) 11/8/2018 Gi’(r)=topk(Gi-1(r) U Vi); Vi’ = Gi’(r) – Gi-1’(r); m = |Vi’|; if m=0 then Gi(r)= Gi-1(r); else with probability 1-Pr(r): Gi(r)= Gi-1(r) with probability Pr: Gi(r)[1:k-m] = Gi-1(r)[1:k-m] Gi(r)[k-m+1:k] = a sorted list of m random values generated from [min(Gi’(r)[k]-delta,Gi-1(r-1)[k-m+1]), Gi’(r)[k]) end Gi(r) Gi-1(r) 10 60 80 100 125 145 Vi 40 50 110 130 150 D1 … D2 Dn Multiple rounds Randomization Probability at round r : Pr (r) = Probabilistic local computation 11/8/2018

Min/Max Protocol - Correctness 2018/11/8 Min/Max Protocol - Correctness Precision bound: Converges with r Smaller p0 and d provides faster convergence

Min/Max Protocol - Cost 2018/11/8 Min/Max Protocol - Cost Communication cost single round: O(n) Minimum # of rounds given precision guarantee (1-e):

Min/Max Protocol - Security 2018/11/8 Min/Max Protocol - Security Probability/confidence based metric: P(C|IR,R) Different types of exposures based on claim Data value: vi=a Data ownership: Vi contains a Loss of Privacy (LoP) = | P(C|IR,R) – P(C|R) | Information entropy based metric: Loss of privacy as a measure of randomness of information: H(D|R) - H(D|IR,R) 0.5 1 Absolute Privacy Provable Exposure

Min/Max Protocol – Security (Analysis) 2018/11/8 Min/Max Protocol – Security (Analysis) Upper bound for average expected LoP: max r 1/2r-1 * (1-P0*dr-1) Larger p0 and d provides better privacy

Min/Max Protocol – Security (Experiments) 2018/11/8 Min/Max Protocol – Security (Experiments) Loss of privacy decreases with increasing number of nodes Probabilistic protocol achieves better privacy (close to 0) When n is large, anonymous protocol is actually okay!

Union Commutative encryption based approach Number of rounds: 2 rounds Each round: encryption and decryption Multi-round random-response approach?

Vector p1 p2 pc VG 1 b1 b2 bL … 1 1 1 1 … OR OR OR = … … … Each database has a boolean vector of the data items Union vector is a logical OR of all vectors Privacy Preserving Indexing of Documents on the Network, Bawa, 2003

Group Vector Protocol … 1 v1 1 … v2 1 … vc … 1 v1 1 … v2 1 … vc … Processing of VG’ at ps of round r Pex=1/2r, Pin=1-Pex for(i=1; i<L; i++) if (Vs[i]=1 and VG’[i]=0) Set VG’[i]=1 with prob. Pin if (Vs[i]=0 and VG’[i]=1) Set VG’[i]=0 with prob. Pex p1 p2 pc 1 … vG’ … vG’ 1 … vG’ 1 … vG’ 1 … vG’ 1 … vG’ 1 … vG’ r=1, Pex=1/2, Pin=1/2 r=2, Pex=1/4, Pin=3/4

Open issues Tradeoff between accuracy, efficiency, and security How to quantify security How to design adjustable protocols Can we generalize the algorithms for a set of operators based on their properties Operators: sum, union, max, min … Properties: commutative, associative, invertible, randomizable

Enjoy the spring break!