Presentation is loading. Please wait.

Presentation is loading. Please wait.

Page 1March 1, 2005 10th Estonian Winter School in Computer Science Privacy Preserving Data Mining Lecture 2 Cryptographic Solutions Benny Pinkas HP Labs,

Similar presentations


Presentation on theme: "Page 1March 1, 2005 10th Estonian Winter School in Computer Science Privacy Preserving Data Mining Lecture 2 Cryptographic Solutions Benny Pinkas HP Labs,"— Presentation transcript:

1 page 1March 1, 2005 10th Estonian Winter School in Computer Science Privacy Preserving Data Mining Lecture 2 Cryptographic Solutions Benny Pinkas HP Labs, Israel

2 page 2March 1, 2005 10th Estonian Winter School in Computer Science Secure two-party computation - definition x y F(x,y) and nothing else Input: Output: x y As if… F(x,y)

3 page 3March 1, 2005 10th Estonian Winter School in Computer Science Secure Function Evaluation A major topic of cryptographic research How to let n parties, P 1,..,P n compute a function F(x 1,..,x n ) – Where input x i is known to party P i – Parties learn the final input and nothing else Caveat: cryptographic definitions of secure computation are both too strong and too weak: – Too strong: do not allow leakage of harmless information; the price of this extra security is in efficiency. – Too weak: do not address leakage or misuse caused by the function itself (e.g., information implied by the outputs, or misbehavior in choosing an input).

4 page 4March 1, 2005 10th Estonian Winter School in Computer Science Leak no other information A protocol is secure if it emulates the ideal solution Alice learns F(x,y), and therefore can compute everything that is implied by x, her prior knowledge of y, and F(x,y). Alice must not be able to compute anything else Simulation: – A protocol is considered secure if: For every adversary in the real world There exists a simulator in the ideal world, which outputs an indistinguishable ``transcript”, given access to the information that the adversary is allowed to learn in the ideal model.

5 page 5March 1, 2005 10th Estonian Winter School in Computer Science Secure Function Evaluation Major Result [Yao]: “ Any function that can be evaluated using polynomial resources can be securely evaluated using polynomial resources” (under some cryptographic assumption)

6 page 6March 1, 2005 10th Estonian Winter School in Computer Science SFE Building Block: 1-out-of 2 Oblivious Transfer Learns nothing YjYj Alice j  {0,1} Bob Y 0, Y 1 1-out-of-2 OT can be based on most public key systems There are implementations with two communication rounds

7 page 7March 1, 2005 10th Estonian Winter School in Computer Science General Two party Computation Two party protocol Input: – Sender: Function F (some representation) The sender’s input Y is already embedded in F – Receiver: X   0,1  n Output: – Receiver: F(x) and nothing else about F – Sender: nothing about x

8 page 8March 1, 2005 10th Estonian Winter School in Computer Science Representations of F Boolean circuits [Yao,GMW,…] Algebraic circuits [BGW,…] Low deg polynomials [BFKR] Matrices product over a large field [FKN,IK] Randomizing polynomials [IK] Communication Complexity Protocol [NN]

9 page 9March 1, 2005 10th Estonian Winter School in Computer Science Secure two-party computation of general functions [Yao] First, represent the function F as a Boolean circuit C – It’s always possible – Sometimes it’s easy (additions, comparisons) – Sometimes the result is inefficient (e.g. for indirect addressing, e.g. A[x] ) Then, “garble” the circuit Finally, evaluate the garbled circuit

10 page 10March 1, 2005 10th Estonian Winter School in Computer Science Garbling the circuit Bob constructs the circuit, and then garbles it. G w i 0,w i 1 w J 0,w J 1 w k 0,w k 1 W values will serve as cryptographic keys W k 0  0 on wire k W k 1  1 on wire k (Alice will learn one string per wire, but not which bit it corresponds to.)

11 page 11March 1, 2005 10th Estonian Winter School in Computer Science Gate tables For every gate, every combination of input values is used as a key for encrypting the corresponding output Assume G=AND. Bob constructs a table: – Encryption of w k 0 using keys w i 0,w J 0 (AND(0,0)=0) – Encryption of w k 0 using keys w i 0,w J 1 (AND(0,1)=0) – Encryption of w k 0 using keys w i 1,w J 0 (AND(1,0)=0) – Encryption of w k 1 using keys w i 1,w J 1 (AND(1,1)=1) Result: given w i x,w J y, can compute w k G(x,y)

12 page 12March 1, 2005 10th Estonian Winter School in Computer Science Secure computation Bob sends the table of gate G to Alice Given, e.g., w i 0,w J 1, Alice computes w k 0 by decrypting the corresponding entry in the table, but she does not know the actual values of the wires. G w i 0,w i 1 wJ0,wJ1wJ0,wJ1 w k 0,w k 1 Encryption of w k 0 using keys w i 0,w J 0 Encryption of w k 0 using keys w i 0,w J 1 Encryption of w k 1 using keys w i 1,w J 1 Encryption of w k 0 using keys w i 1,w J 0 Permuted order

13 page 13March 1, 2005 10th Estonian Winter School in Computer Science Secure computation Bob sends to Alice – Tables encoding each circuit gate. – Garbled values (w’s) of his input values. – Translation from garbled values of output wires to actual 0/1 values. If Alice gets garbled values (w’s) of her input values, she can compute the output of the circuit, and nothing else.

14 page 14March 1, 2005 10th Estonian Winter School in Computer Science Alice’s input For every wire i of Alice’s input: – The parties run an OT protocol – Alice’s input is her input bit (s). – Bob’s input is w i 0,w i 1 – Alice learns w i s The OTs for all input wires can be run in parallel. Afterwards Alice can compute the circuit by herself.

15 page 15March 1, 2005 10th Estonian Winter School in Computer Science Secure computation – the big picture Represent the function as a circuit C Bob sends to Alice 4|C| encryptions (e.g. 64|C| Bytes), 4 encryptions for every gate. Alice performs an OT for every input bit. (Can do, e.g. 100-1000 OTs per sec.) ~One round of communication. Efficient for medium size circuits!

16 page 16March 1, 2005 10th Estonian Winter School in Computer Science Example The Millionaires problem: comparing two N bit numbers What’s the overhead?

17 page 17March 1, 2005 10th Estonian Winter School in Computer Science Applications Two parties. Two large data sets. Max? Mean? Median? Intersection? Decision Tree learning? ID3?

18 page 18March 1, 2005 10th Estonian Winter School in Computer Science Fairplay – a secure two-party computation system Malkhi, Nissan, P., Sella A a full fledged secure two-party computation system, implementing Yao’s “garbled circuit” protocol. Goals: – Investigate whether two-party SFE is practical – Actual measurements of overall computation – Breakdown of computation into parts – Computation versus communication? – Test-bed for various optimizations

19 page 19March 1, 2005 10th Estonian Winter School in Computer Science Fairplay The Compilation paradigm – Programs written in SFDL, a high-level programming language – Allows clear, formal, easily understandable definition and requirements by humans – SHDL: Low-level language describing Boolean circuits – SFDL  SHDL compiler and optimizer – SHDL  Java programs implementing Yao’s protocol

20 page 20March 1, 2005 10th Estonian Winter School in Computer Science Fairplay – SFDL example program Millionaires { type int = Int ; // 20-bit integer type AliceInput = int; type BobInput = int; type AliceOutput = Boolean; type BobOutput = Boolean; type Output = struct {AliceOutput alice, BobOutput bob}; type Input = struct {AliceInput alice, BobInput bob}; function Output output(Input input) { output.alice = input.alice > input.bob; output.bob = input.bob > input.alice; }

21 page 21March 1, 2005 10th Estonian Winter School in Computer Science SFDL properties Conventional syntax (C/Pascal-like) Type system – Boolean, integer, enumerated Program structure – Declarations: global constants, types – Sequence of functions (no nesting [C], no recursion) – Function name is its return value [Pascal] Conditional execution and loops – if-then, if-then-else statements, For-loop (loop boundaries should be known at compile time) Assignments and expressions – constants, variables, array entries, structure items, function calls, operators (+, -, logical, comparison), parenthesis

22 page 22March 1, 2005 10th Estonian Winter School in Computer Science SHDL example 0 input//output$input.bob$0 1 input//output$input.bob$1 2 input//output$input.bob$2 3 input//output$input.bob$3 4 input//output$input.alice$0 5 input//output$input.alice$1 6 input//output$input.alice$2 7 input//output$input.alice$3 8 gate arity 2 table [ 1 0 0 0 ] inputs [ 4 5 ] 9 gate arity 2 table [ 0 1 1 0 ] inputs [ 4 5 ]

23 page 23March 1, 2005 10th Estonian Winter School in Computer Science k th -ranked element (e.g. median) Inputs: – Alice: S A Bob: S B – Large sets of unique items (  D). Output: – x  S A  S B s.t. x has k-1 elements smaller than it. The rank k – Could depend on the size of input datasets. – Median: k = (|S A | + |S B |) / 2 Motivation: – Basic statistical analysis of distributed data. – E.g. histogram of salaries in CS departments The Problem: Generic constructions using circuits [Yao …] yield an overhead which is at least linear in k.

24 page 24March 1, 2005 10th Estonian Winter School in Computer Science An (insecure) two-party median protocol RARA LALA SASA SBSB m A RBRB LBLB m B L A lies below the median, R B lies above the median. New median is same as original median. Recursion  Need log n rounds (assume each set contains n=2 i items) m A < m B

25 page 25March 1, 2005 10th Estonian Winter School in Computer Science A Secure two-party median protocol A finds its median m A B finds its median m B mA < mBmA < mB A deletes elements ≤ m A. B deletes elements > m B. A deletes elements > m A. B deletes elements ≤ m B. YES NO Secure comparison (e.g. a small circuit)

26 page 26March 1, 2005 10th Estonian Winter School in Computer Science An example A B mA>mBmA>mB mA<mBmA<mB mA<mBmA<mB mA>mBmA>mB mA<mBmA<mB Median found!! 89161 1 1

27 page 27March 1, 2005 10th Estonian Winter School in Computer Science Proof of security A B mA>mBmA>mB mA<mBmA<mB mA<mBmA<mB mA>mBmA>mB mA<mBmA<mB median mA>mBmA>mB mA<mBmA<mB mA<mBmA<mB mA>mBmA>mB mA<mBmA<mB

28 page 28March 1, 2005 10th Estonian Winter School in Computer Science ++ Arbitrary input size, arbitrary k SASA SBSB k Now, compute the median of two sets of size k. Size should be a power of 2. median of new inputs = k th element of original inputs 2i2i ++ --

29 page 29March 1, 2005 10th Estonian Winter School in Computer Science Hiding size of inputs Can search for k th element without revealing size of input sets. However, k=n/2 (median) reveals input size. Solution: Let S=2 i be a bound on input size. |S A | S -- ++ -- ++ |S B | Median of new datasets is same as median of original datasets.

30 page 30March 1, 2005 10th Estonian Winter School in Computer Science Privacy preserving data mining Confidential database D1 Wish to “mine” D1  D2 without revealing more info Examples: Medical databases protected by law Competing businesses Government agencies (privacy, “need to know”) Confidential database D2 P1P1 P2P2 Huge

31 page 31March 1, 2005 10th Estonian Winter School in Computer Science The classification problem Age > 30Sextime insured Claim > $500 Did fraud occur? C1YesM t  [0,9] years No C2NoF t  [10,19] years Yes ……………… CnYesF t  [20,29] years No Goal: based on available data design an algorithm to classify new data

32 page 32March 1, 2005 10th Estonian Winter School in Computer Science Classification using Decision Trees Time insured No [0,9] years > 20 years [10,19] years Age > 30 NoYes No Claim > $ 500 NoYes No ID3: Choose attribute A that minimizes the conditional entropy of the attribute class

33 page 33March 1, 2005 10th Estonian Winter School in Computer Science Privacy Preserving ID3 Scenario: The inputs are private information of P 1 and P 2 Main technical problem: Comparing entropies while preserving privacy. (entropy =  x logx) Efficiency: – most computation done independently by parties. – The overhead of cryptographic operations depends only on the size of the decision tree (not on the input size). Basic task: compute x log x. x = x 1 +x 2 = e.g., total number of customers with (age > 30) and (fraud = yes)

34 page 34March 1, 2005 10th Estonian Winter School in Computer Science Privacy Preserving ID3 Computing x log x: – x = x1 + x2, known to P1 and P2 respectively (independently computed from databases). – Might as well compute x lnx, or lnx. – First run a protocol to compute random shares, y1 + y2 = ln x ln x is Real. Crypto works over finite fields. Must do numerical analysis.

35 page 35March 1, 2005 10th Estonian Winter School in Computer Science Cryptographic Tools x Implementation: Two passes, O(degree) (or O( log|F|) ) exponentiations. A polynomial Q(·) Q(x) and nothing else nothing Input: Output: Secure Function Evaluation (SFE) [Yao] Oblivious Polynomial Evaluation [NP]

36 page 36March 1, 2005 10th Estonian Winter School in Computer Science Computing random shares of lnx = ln(x 1 +x 2 ) Use Taylor approximation for lnx – x = x 1 + x 2 = 2 n (1+  ) -½ <  < ½ – lnx = ln(2 n (1+  )) = ln 2 n + ln(1+  )  ln 2 n +  i=1..k (-1) i-1  i / i = ln 2 n + T(  ) T(  ) is a polynomial of degree k. Error is exponentially small in k. We only know how to work over finite fields Compute c·lnx, where c compensates for fractions. Work in F, where |F| sufficiently large.

37 page 37March 1, 2005 10th Estonian Winter School in Computer Science ln(x 1 +x 2 ) Protocol Step 1 of the protocol – Find n,  – Apply Yao’s protocol to the following small circuit Input: x 1 and x 2 Output (random shares): random a 1 and a 2 s.t. a 1 + a 2 = x-2 n =  ·2 n random b 1 and b 2 s.t. b 1 + b 2 = ln 2 n Operation: The protocol finds 2 n closest to x 1 + x 2, computes  2 n = x 1 + x 2 - 2 n. – x = x 1 + x 2 = 2 n +  2 n – lnx = ln(2 n (1+  )) = ln 2 n + ln(1+  )

38 page 38March 1, 2005 10th Estonian Winter School in Computer Science ln(x 1 +x 2 ) Protocol (Cont.) Step 2 of the protocol – Compute random shares of T(  ) (Taylor approx.) –P 1 chooses a random w 1  F and defines a polynomial Q(x), s.t. w 1 +Q(a 2 ) = T(  ) (recall a 1 + a 2 =  ·2 n ) –Namely, Q(x) = T( (a 1 +x)/2 n ) – w 1. –Run an oblivious poly evaluation in which P 2 computes w 2 = Q( a 2 ) = T(  ) – w 1. –Now the parties have random w 1 and w 2 s.t. –w 1 + w 2 = T(  )  ln(1+  ) –(b 1 + w 1 ) + (b 2 + w 2 )  ln 2 n + ln(1+  ) = ln x

39 page 39March 1, 2005 10th Estonian Winter School in Computer Science Computing x lnx Tool: Multiply(c 1,c 2 ) – Input: c 1, c 2 – Output: d 1, d 2 s.t. d 1 +d 2 = c 1 *c 2 – How? OPE of Q(z) = c 1 *z -d 1 Actual task: x lnx – Input: x 1 +x 2 =x, c 1 +c 2 = ln x – Output: x lnx = (x 1 +x 2 )*(c 1 +c 2 ) – Run Multiply(x 1,c 2 ), Multiply (c 1,x 2 )

40 page 40March 1, 2005 10th Estonian Winter School in Computer Science The rest of the work.. The parties compute shares of lnx Then they compute shares of xlnx Each party computes a share of the entropy by summing shares of x lnx (H(X) =  x lnx ) A small circuit finds the attribute giving the minimal conditional entropy The attribute is assigned to the node The databases are divided according to the value of this attribute

41 page 41March 1, 2005 10th Estonian Winter School in Computer Science Efficiency lnx protocol: – secure computation of a small circuit – one oblivious polynomial evaluation ID3 for a database with: –1,000,000 transactions –15 attributes –10 values per attribute –4 class values –Communication per node takes seconds (T1) –Computation per node takes minutes (P3)

42 page 42March 1, 2005 10th Estonian Winter School in Computer Science Contributions Cryptographic protocols where the bulk of the operations is done independently. Data mining – Rigorous model for secure data-mining. – Efficient, secure protocol for specific problems (median, ID3). Cryptography – Sub-linear complexity - secure computation for large data sets. – Efficient protocols for complex known algorithms. – Secure computation of logarithms (real function - numerical analysis). Drawbacks: – Privacy preserving solutions are less efficient – It’s hard to find efficient private solutions for all interesting functions – Security against malicious parties

43 page 43March 1, 2005 10th Estonian Winter School in Computer Science References Lecture notes and overview papers: – B. Pinkas, Cryptographic Techniques for Privacy-Preserving Data Mining, SIGKDD Explorations, January 2003. http://www.pinkas.net/PAPERS/sigkdd.pdf – R. Cramer: Introduction to Secure Computation, 2000. http://homepages.cwi.nl/~cramer/papers/CRAMER_revised.ps http://homepages.cwi.nl/~cramer/papers/CRAMER_revised.ps – Ivan Damgård, Theory and practice of multiparty computation, 8 th EWSCS, http://www.cs.ioc.ee/yik/schools/win2003/damgard.php Research papers: – G. Aggarwal, N. Mishra and B. Pinkas, Secure Computation of the K'th-ranked Element, Eurocrypt '2004. http://www.pinkas.net/PAPERS/ANP04.pdf – Y. Lindell and B. Pinkas, Privacy Preserving Data Mining, Journal of Cryptology, Vol. 15 – No. 3, 2002. http://www.pinkas.net/PAPERS/id3- final.pdf


Download ppt "Page 1March 1, 2005 10th Estonian Winter School in Computer Science Privacy Preserving Data Mining Lecture 2 Cryptographic Solutions Benny Pinkas HP Labs,"

Similar presentations


Ads by Google