Presentation is loading. Please wait.

Presentation is loading. Please wait.

Privacy Preserving Data Mining Yehuda Lindell & Benny Pinkas.

Similar presentations


Presentation on theme: "Privacy Preserving Data Mining Yehuda Lindell & Benny Pinkas."— Presentation transcript:

1 Privacy Preserving Data Mining Yehuda Lindell & Benny Pinkas

2 Summary Objective Various components / tools needed Algorithm

3 Objective Perform Data-mining on union of two private databases Data stays private i.e. no party learns anything but output

4 Assumptions Large Databases – Generic Solutions not possible Semi-Honest Parties

5 Classification by Decision Tree Learning Transaction Attributes Class Attribute Want to Predict Class, using only non-class attributes

6 Decision Tree Rooted tree with nodes/edges Internal Nodes => Attributes Edges leaving nodes => Possible values Leaves => Expected Class for transaction – Traverse tree using known attributes – Predict class given leaf node’s value

7 Constructing Tree Top-down At each level – find attribute that “best” classifies transactions => gives least overhead – Best => Attribute that minimizes entropy (maximizes information gain) – Entropy = -xlnx – Entropy of class = 0

8 Entropy calcluations Entropy – H(T) = sum (-x ln x ) – Hc(T) => Info needed to ID class of transaction T X = set of transactions for each class Sum over all possible classes – Hc(T | A) => Info needed to ID class of transaction T, Given value v of attribute A X = transactions with value = v for attribute A – Gain = Hc(T) – Hc(T | A)

9 Private Computation Given only x1 and f1(x1,y), function S1 exists s.t.: – P2 provides input x1 to P1 – P2 can compute corresponding view of P1’s DB (desired pairs) S1 f1 Party 2 Party 1 x1 f1(x1,y) View

10 Oblivious Evaluation What if in previous example: Party 2 does not want Party 1 to know what input (x1) it is providing? Oblivious Evaluation: Receiver obtains P(x) without learning anything else about polynomial P. Sender learns nothing about x.

11 Oblivious Evaluation (2) – Simplified Version ri = receiver’s random number Ri = sender’s random number X = input from rcvr SenderReceiver s (secret key) (a ri, a s*rj a x ) (a Ri, a s*R a P(x) a sri ) Divide 2 nd element by 1 st element raised to power s to get P(x) a P(x) = (a Ri, a s*R a P(x) a sri ) / (a Ri * a ri ) s

12 Algorithm Step 1 - Each party computes ID3 – decision tree learning – (O(# attributes)) Step 2 - Combine results using cryptographic protocols like oblivious evaluation - (O(log(#transactions))) Result - Each party gains results of data- mining without learning more than necessary

13 Algorithm (2) Finding “best” attribute is hardest part Each party computes their “share” of entropy – For each attribute, combine values from each party – Results in private computation of Entropy (-xlnx) Choose attribute that minimizes entropy – Provides maximum information gain – Ensures most efficient tree with least overhead – Use oblivious Evaluation

14 Discussion of Algorithm Efficient: – Large Databases accommodated: Algorithm relies on number of possible values for attributes – NOT number of transactions in database Private: – Each step depends on local computation and private protocol – Uses techniques like oblivious transfer / evaluation to exchange information – Paper proves individual steps are private, AND can predict control flow between steps ONLY based on input/output – so also private

15 Discussion of Algorithm (2) Approximate ID3 used instead of actual ID3 – shown to be as secure and provide same information


Download ppt "Privacy Preserving Data Mining Yehuda Lindell & Benny Pinkas."

Similar presentations


Ads by Google