Presentation is loading. Please wait.

Presentation is loading. Please wait.

Privacy Preserving Schema and Data Matching Scannapieco, Bertino, Figotin and Elmargarmid Presented by : Vidhi Thapa.

Similar presentations


Presentation on theme: "Privacy Preserving Schema and Data Matching Scannapieco, Bertino, Figotin and Elmargarmid Presented by : Vidhi Thapa."— Presentation transcript:

1 Privacy Preserving Schema and Data Matching Scannapieco, Bertino, Figotin and Elmargarmid Presented by : Vidhi Thapa

2 INTRODUCTION  Record Matching Process of identifying records representing same real world entity Can be executed in  Single source  Across sources Goal: Record matching that preserves privacy of both data and schema

3 RECORD MATCHING  Record matching involves: Sharing and integrating data Protecting privacy of data  Two major innovations: Approximate matching Awareness of schema information

4 EMBEDDING  Embed records in Euclidean space  Method used SparseMap  Comparison Functions edit distance  Matching Decision Rule Classify records as a match/ non-match  Record Matching

5 EXAMPLE EDIT DISTANCE  e( “Virginia”, “Vermont”) = 5 Virginia Verginia Verminia Vermonia Vermonta Vermont

6 HYPOTHESIS  Two hypothesis: Parties P and Q store the records to be matched in the relations R P (A 1,…A n ) and R Q (B 1,…B n ) respectively, 1. having identical schemas 2. having possible schema-level conflicts  Record matching between R P and R Q  P will know only a set P Match, consisting of records in R P that match with records in R Q.  Similarly Q will know only the set Q Match.

7 SECURE DATA MATCHING  Pairs of records compared by means of comparison function  Third party introduced to assure privacy  SparseMap reference set  metric space No. of subsets = [log 2 N] 2

8 HEURISTIC  Distance Approximation Input: Object o, Set S i Output: Approx d(o, S i )  Greedy Sampling Input: m co-ordinates Output: t <= m most discriminating co-ordinates

9 DATA MATCHING PROTOCOL  assume parties P and Q store records to be matched in the relations R P (A 1,…A n ) and R Q (B 1,…B n ) respectively  a third party-based protocol consists of the three following phases Phase 1: Setting of the embedding space Phase 2:Embedding of R P and R Q values Phase 3:Comparison to decide matching records

10 Phase 1

11 Phase 2

12 ILLUSTRATION  Stress  Eg: Academic(8.0,5.0,7.0,7.0) and usefull(6.0,6.0,6.0,7.0) Using 1 st co-ordinate – 0.5625, Using 2 nd co-ordinate – 0.7656 Using 3 rd co-ordinate – 0.7656 Using 4 th co-ordinate – 1.0  Choose 1 st co-ordinate Using 1 st and 2 nd co-ordinate – 0.5191 Using 1 st and 3 rd co-ordinate – 0.5191 Using 1 st and 4 th co-ordinate – 0.5625

13 Phase 3  Given a vector v in P str and w in Q str, the Euclidean distance calculated  Decision rule applied to all records comparisons: If true, records of P str and Q str inserted in two sets P Match and Q Match respectively  Final sets sent to two parties respectively

14 SECURE SCHEMA MATCHING  S W : global schema owned by third party W  L W : language  α w : alphabet  S P and S Q are the source schemas owned by two parties  if S W is Customer (Name, DateofBirth, ResidenceAddress) and S P is Cust( FirstName, LastName, DateofBirth), it is mapped as concatenate( Cust.FirstName, Cust.LastName) = Customer.Name

15 SECURE SCHEMA MATCHING (contd)  P generates SP’ (D1,..., Ds) from the mapping of SP with SW(D1,..., DL);  Q generates SQ’(D1,..., Dx) from the mapping of SQ with SW(D1,..., DL);  P and Q negotiate: secret key k Embedding parameters ( Lx, N, dist); Hash function h  P sends HP =(h(D1, k),..., h(Ds, k)) to W;  Q sends HQ = (h(D1, k)..., h(Dx, k)) to W;  W computes the intersection HP ∩ HQ

16 SECURITY ANALYSIS  Length of the database  Database size  Set of matching records  Set of matching attributes  Number of matching attributes

17 EXPERIMENTAL EVALUATION

18

19 CONCLUSION  Privacy-preserving record matching between two parties that can have different schemas  Requires privacy at schema level  Obtain privacy by embedding records in vector space  Applications: DNA sequences, Images, Proteins, etc.


Download ppt "Privacy Preserving Schema and Data Matching Scannapieco, Bertino, Figotin and Elmargarmid Presented by : Vidhi Thapa."

Similar presentations


Ads by Google