Presentation is loading. Please wait.

Presentation is loading. Please wait.

How to compile searching software so that it is impossible to reverse-engineer. (Private Keyword Search on Streaming Data)

Similar presentations


Presentation on theme: "How to compile searching software so that it is impossible to reverse-engineer. (Private Keyword Search on Streaming Data)"— Presentation transcript:

1 http://www.cs.ucla.edu/~rafail/ How to compile searching software so that it is impossible to reverse-engineer. (Private Keyword Search on Streaming Data) Rafail Ostrovsky William Skeith UCLA (patent pending)

2 MOTIVATION: Problem 1. Each hour, we wish to find if any of hundreds of passenger lists has a name from “ Possible Terrorists ” list and if so his/hers itinerary. Each hour, we wish to find if any of hundreds of passenger lists has a name from “ Possible Terrorists ” list and if so his/hers itinerary. “ Possible Terrorists ” list is classified and should not be revealed to airports “ Possible Terrorists ” list is classified and should not be revealed to airports Tantalizing question: can the airports help (and do all the search work) if they are not allowed to get “ possible terrorist ” list? Tantalizing question: can the airports help (and do all the search work) if they are not allowed to get “ possible terrorist ” list? PROBLEM 1: Is it possible to design mobile software that can be transmitted to all airports (including potentially revealing this software to the adversary due to leaks) so that this software collects ONLY information needed and without revealing what it is collecting at each node? Non-triviality requirement: must send back only needed information, not everything! Airport 1 passenger list Airport 2 passenger list Airport 3 passenger list Mobile code (with state) Mobile code (with state)

3 MOTIVATION: Problem 2. Looking for malicious insiders and/or terrorists communication: Looking for malicious insiders and/or terrorists communication: (I) First, we must identify some “ signature ” criteria (rules) for suspicious behavior – typically, this is done by analysts. (I) First, we must identify some “ signature ” criteria (rules) for suspicious behavior – typically, this is done by analysts. (II) Second, we must detect which nodes/stations transmit these signatures. (II) Second, we must detect which nodes/stations transmit these signatures. Here, we want to tackle part (II). Here, we want to tackle part (II). PROBLEM 2: Is it possible to design software that can capture all messages (and network locations) that include secret/classified set of “rules”? Key challenge: the software must not reveal secret “rules”. Non-triviality requirement: the software must send back only locations and messages that match given “rules”, not everything it sees. Public networks

4 Current Practice Continuously transfer all data to a secure environment. Continuously transfer all data to a secure environment. After data is transferred, filter in the classified environment, keep only small fraction of documents. After data is transferred, filter in the classified environment, keep only small fraction of documents.

5  D (1,3)  D (1,2)  D (1,1)   D (2,3)  D (2,2)  D (2,1)   D (3,3)  D (3,2)  D (3,1)  Classified Environment FilterStorage D (3,1) D (1,1) D (1,2) D (2,2) D (2,3) D (3,2) D (2,1) D (1,3) D (3,3) Filter rules are written by an analyst and are classified! Current practice: Amount of data that must be transferred to a classified environment is enormous!

6 Drawbacks: Communication Communication Processing Processing Cost and timeliness Cost and timeliness

7 How to improve performance? Distribute work to many locations on a network, where you decide “on the fly” which data is useful Distribute work to many locations on a network, where you decide “on the fly” which data is useful Seemingly ideal solution, but… Seemingly ideal solution, but… Major problem: Major problem: Not clear how to maintain security, which is the focus of this technology. Not clear how to maintain security, which is the focus of this technology.

8 Our Architecture Search software, that has a set of “ rules ” to choose which documents and/or packets to keep and which to toss. Small storage (that collects selected documents and/or packets) Various data streams, consisting of flows of documents/packets Our “compiler” outputs straight line executable code (with program state) and a decryption key “D”. STRAIGHT LINE EXECUTABLE CODE THAT DOES NOT REVEAL SEARCH “RULES” Small Fixed-size Program State (encrypted in a special way that our code modifies for each document processed) documents/ packets that match secret “rules” Decrypt using D Punch line: we can send executable code publicly. (it won’t reveal its secrets!)

9 Low network …  D (1,3)  D (1,2)  D (1,1)  …  D (2,3)  D (2,2)  D (2,1)  …  D (3,3)  D (3,2)  D (3,1)  HIGH NETWORK (classified) Filter Storage E (D (1,2) ) E (D (1,3) ) E (D (1,3) ) Filter Storage E (D (2,2) ) Filter Storage Decrypt Storage D (1,2) D (1,3) D (2,2)

10 Example Filters: Example Filters: Look for all documents that contain special classified keywords (or string or data-item and/or do not contain some other data), selected by an analyst. Look for all documents that contain special classified keywords (or string or data-item and/or do not contain some other data), selected by an analyst. Privacy Privacy Must hide what rules are used to create the filter Must hide what rules are used to create the filter Output must be encrypted Output must be encrypted

11 What do we want? …  D (1,3)  D (1,2)  D (1,1)  Filter Storage E (D (1,2) ) E (D (1,3) ) E (D (1,3) ) Conundrum: Complied Filter Code is not allowed to have ANY branches (i.e. any “if then else” executables). Only straight-line code is allowed ! 2 requirements: correctness: only matching documents are saved, nothing else. efficiency: the decoding is proportional to the length of the buffer, not the size of the entire stream.

12 Simplifying Assumptions for this Talk All keywords come from some poly-size dictionary All keywords come from some poly-size dictionary Truncate documents beyond a certain length Truncate documents beyond a certain length

13 Sneak peak: the compiled code Suppose we are looking for all documents that contain some secret word from Webster dictionary. Suppose we are looking for all documents that contain some secret word from Webster dictionary. Here is how it looks to the adversary: For each document, execute the same code as follows: Here is how it looks to the adversary: For each document, execute the same code as follows:

14 w n-2 E(*) w n-1 E(*) wnwnwnwn w1w1w1w1 w2w2w2w2 w3w3w3w3 w4w4w4w4 w5w5w5w5...... D(*,*,*)(*,*,*)(*,*,*)(*,*,*)(*,*,*)(*,*,*)(*,*,*)(*,*,*)(*,*,*)(*,*,*) g Dictionary Small Output Buffer Lookup encryptions of all words appearing in the document and multiply them together. Take this value and apply a fixed formula to it to get value g.

15 How should a solution look?

16 This is matching document #2 This is a Non- matching document This is matching document #1 This is matching document #3 This is a Non- matching document

17 How do we accomplish this?

18 Reminder: PKE Key-generation(1 k )  (PK, SK) Key-generation(1 k )  (PK, SK) E(PK,m,r)  c E(PK,m,r)  c D(c, SK)  m D(c, SK)  m We will use PKE with additional properties. We will use PKE with additional properties.

19 Several Solutions based on Homomorphic Public-Key Encryptions For this talk: Paillier Encryption For this talk: Paillier Encryption Properties: Properties: E(x) is probabilistic, in particular can encrypt a single bit in many different ways, s.t. any instances of E(0) and any instance of E(1) can not be distinguished. E(x) is probabilistic, in particular can encrypt a single bit in many different ways, s.t. any instances of E(0) and any instance of E(1) can not be distinguished. Homomorphic: i.e., E(x)*E(y) = E(x+y) Homomorphic: i.e., E(x)*E(y) = E(x+y)

20 Using Paillier Encryption E(x)E(y) = E(x+y) E(x)E(y) = E(x+y) Important to note: Important to note: E(0) c = E(0)*…*E(0) = E(0) c = E(0)*…*E(0) = = E(0+0+….+0) = E(0) = E(0+0+….+0) = E(0) E(1) c = E(1)*…*E(1) = E(1) c = E(1)*…*E(1) = = E(1+1+…+1) = E(c) = E(1+1+…+1) = E(c) Assume we can somehow compute an encrypted value v, where we don’t know what v stands for, but v=E(0) for “un-interesting” documents and v=E(1) for “interesting” documents. Assume we can somehow compute an encrypted value v, where we don’t know what v stands for, but v=E(0) for “un-interesting” documents and v=E(1) for “interesting” documents. What’s v c ? It is either E(0) or E(C) where we don’t know which one it is. What’s v c ? It is either E(0) or E(C) where we don’t know which one it is.

21 w n-2 E(1) w n-1 E(0) wnwnwnwn w1w1w1w1 w2w2w2w2 E(1) w3w3w3w3 E(0) w4w4w4w4 w5w5w5w5 E(1)...... D E(0) (g,g D ) Dictionary Output Buffer g  E(0) * E(1) * E(0) g = E(0) if there are no matching words g = E(c) if there are c matching words g D = E(0) if there are no matching words g D = E(c*D) if there are c matching words Thus: if we keep g=E(c) and g D =E(c*D), we can calculate D exactly.

22 This is matching document #1 This is matching document #3 This is matching document #2 Here’s another matching document Collisions cause two problems: 1.Good documents are destroyed 2. Non-existent documents could be fabricated

23 We’ll make use of two combinatorial lemmas… We’ll make use of two combinatorial lemmas…

24

25 Combinatorial Lemma 1 Claim: color survival games succeeds with probability > 1-neg(  ) Claim: color survival games succeeds with probability > 1-neg(  )

26 How to detect collisions? Idea: append a highly structured, (yet random) short combinatorial object to the message with the property that if 2 or more of them “collide” the combinatorial property is destroyed. Idea: append a highly structured, (yet random) short combinatorial object to the message with the property that if 2 or more of them “collide” the combinatorial property is destroyed.  can always detect collisions!  can always detect collisions!

27 100|001|100|010|010|100|001|010|010 010|001|010|001|100|001|100|001|010 010|100|100|100|010|001|010|001|010 100|100|010|111|100|100|111|010|010 =

28 Combinatorial Lemma 2 Claim: collisions are detected with probability > 1 - exp(-k/3)

29 We do the same for all documents!

30 w n-2 E(*) w n-1 E(*) wnwnwnwn w1w1w1w1 w2w2w2w2 w3w3w3w3 w4w4w4w4 w5w5w5w5...... D(*,*,*)(*,*,*)(*,*,*)(*,*,*)(*,*,*)(*,*,*)(*,*,*)(*,*,*)(*,*,*)(*,*,*) (g,g D,f(g)) Dictionary Small Output Buffer For every document in the stream do the same: Lookup encryptions of all words appearing in the document and multiply them together (= g).  multiply (g,g D,f(g))into  randomly chosen locations Compute g D and f(g)

31 Extensions (1) Can execute more sophisticated rules: Can execute more sophisticated rules: “OR” of keywords “OR” of keywords Catch documents where some words must be not be present Catch documents where some words must be not be present Catch documents where certain words must be “close” in text Catch documents where certain words must be “close” in text …Many others, depending on the application.

32 Extensions (2) Can do even more: Can do even more: Detect overflow. Detect overflow. In case of an overflow of matching documents, collect a “sample” In case of an overflow of matching documents, collect a “sample” Dynamically change “rules” on a public web- page Dynamically change “rules” on a public web- page Can act as an ultimate corporate security tool! Can act as an ultimate corporate security tool!…

33 Conclusions We introduced Private searching on streaming data More generally: “smart” encryption Practical, deployable solutions Eat your cake and have it too: ensure that only “useful” documents are collected. A new gadget in your quiver of technologies! THANK YOU!


Download ppt "How to compile searching software so that it is impossible to reverse-engineer. (Private Keyword Search on Streaming Data)"

Similar presentations


Ads by Google