Presentation is loading. Please wait.

Presentation is loading. Please wait.

Private Keyword Search on Streaming Data Rafail Ostrovsky William Skeith UCLA (patent pending)

Similar presentations


Presentation on theme: "Private Keyword Search on Streaming Data Rafail Ostrovsky William Skeith UCLA (patent pending)"— Presentation transcript:

1 http://www.cs.ucla.edu/~rafail/ Private Keyword Search on Streaming Data Rafail Ostrovsky William Skeith UCLA (patent pending)

2 Motivating Example The intelligence community collects data from multiple sources that might potentially be “useful” for future analysis. The intelligence community collects data from multiple sources that might potentially be “useful” for future analysis. Network traffic Network traffic Chat rooms Chat rooms Web sites, etc… Web sites, etc… However, what is “useful” is often classified. However, what is “useful” is often classified.

3 Current Practice Continuously transfer all data to a secure environment. Continuously transfer all data to a secure environment. After data is transferred, filter in the classified environment, keep only small fraction of documents. After data is transferred, filter in the classified environment, keep only small fraction of documents.

4 ¢¢¢! D (1,3) ! D (1,2) ! D (1,1) ! ¢¢¢! D (2,3) ! D (2,2) ! D (2,1) ! ¢¢¢! D (3,3) ! D (3,2) ! D (3,1) ! Classified Environment FilterStorage D (3,1) D (1,1) D (1,2) D (2,2) D (2,3) D (3,2) D (2,1) D (1,3) D (3,3) Filter rules are written by an analyst and are classified!

5 Current Practice Drawbacks: Drawbacks: Communication Communication Processing Processing

6 How to improve performance? Distribute work to many locations on a network Distribute work to many locations on a network Seemingly ideal solution, but… Seemingly ideal solution, but… Major problem: Major problem: Not clear how to maintain privacy, which is the focus of this talk Not clear how to maintain privacy, which is the focus of this talk

7 ¢¢¢! D (1,3) ! D (1,2) ! D (1,1) ! ¢¢¢! D (2,3) ! D (2,2) ! D (2,1) ! ¢¢¢! D (3,3) ! D (3,2) ! D (3,1) ! Classified Environment Filter Storage E (D (1,2) ) E (D (1,3) ) E (D (1,3) ) Filter Storage E (D (2,2) ) Filter Storage Decrypt Storage D (1,2) D (1,3) D (2,2)

8 Example Filter: Example Filter: Look for all documents that contain special classified keywords, selected by an analyst Look for all documents that contain special classified keywords, selected by an analyst Perhaps an alias of a dangerous criminal Perhaps an alias of a dangerous criminal Privacy Privacy Must hide what words are used to create the filter Must hide what words are used to create the filter Output must be encrypted Output must be encrypted

9 More generally: We define the notion of Public Key Program Obfuscation We define the notion of Public Key Program Obfuscation Encrypted version of a program Encrypted version of a program Performs same functionality as un-obfuscated program, but: Performs same functionality as un-obfuscated program, but: Produces encrypted output Produces encrypted output Impossible to reverse engineer Impossible to reverse engineer A little more formally: A little more formally:

10 Public Key Program Obfuscation

11 Privacy

12 Related Notions PIR (Private Information Retrieval) [CGKS],[KO],[CMS]… PIR (Private Information Retrieval) [CGKS],[KO],[CMS]… Keyword PIR [KO],[CGN],[FIPR] Keyword PIR [KO],[CGN],[FIPR] Program Obfuscation [BGIRSVY]… Program Obfuscation [BGIRSVY]… Here output is identical to un-obfuscated program, but in our case it is encrypted. Here output is identical to un-obfuscated program, but in our case it is encrypted. Public Key Program Obfuscation Public Key Program Obfuscation A more general notion than PIR, with lots of applications A more general notion than PIR, with lots of applications

13 What we want ¢¢¢! D (1,3) ! D (1,2) ! D (1,1) ! Filter Storage

14 This is matching document #2 This is a Non- matching document This is matching document #1 This is matching document #3 This is a Non- matching document

15 How to accomplish this?

16 Several Solutions based on Homomorphic Encryptions For this talk: Paillier Encryption For this talk: Paillier Encryption Properties: Properties: Plaintext set = Z n Plaintext set = Z n Ciphertext set = Z * n 2 Ciphertext set = Z * n 2 Homomorphic, i.e., E(x)E(y) = E(x+y) Homomorphic, i.e., E(x)E(y) = E(x+y)

17 Simplifying Assumptions for this Talk All keywords come from some poly-size dictionary All keywords come from some poly-size dictionary Truncate documents beyond a certain length Truncate documents beyond a certain length

18 w t-2 E(1) w t-1 E(0) wtwtwtwt w1w1w1w1 w2w2w2w2 E(1) w3w3w3w3 E(0) w4w4w4w4 w5w5w5w5 E(1)...... D E(0 ) (g,g D ) ¤=¤=¤=¤=¤=¤= Dictionary Output Buffer

19 This is matching document #1 This is matching document #3 This is matching document #2 Here’s another matching document Collisions cause two problems: 1.Good documents are destroyed 2. Non-existent documents could be fabricated

20 We’ll make use of two combinatorial lemmas… We’ll make use of two combinatorial lemmas…

21

22 How to detect collisions? Append a highly structured, (yet random) k-bit string to the message Append a highly structured, (yet random) k-bit string to the message The sum of two or more such strings will be another such string with negligible probability in k The sum of two or more such strings will be another such string with negligible probability in k Specifically, partition k bits into triples of bits, and set exactly one bit from each triple to 1 Specifically, partition k bits into triples of bits, and set exactly one bit from each triple to 1

23 100|001|100|010|010|100|001|010|010 010|001|010|001|100|001|100|001|010 010|100|100|100|010|001|010|001|010 100|100|010|111|100|100|111|010|010 =

24 Detecting Overflow > m Double buffer size from m to 2m Double buffer size from m to 2m If m < #documents < 2m, output “overflow” If m < #documents < 2m, output “overflow” If #documents > 2m, then expected number of collisions is large, thus output “overflow” in this case as well. If #documents > 2m, then expected number of collisions is large, thus output “overflow” in this case as well. Not yet in eprint version, will appear soon, as well as some other extensions. Not yet in eprint version, will appear soon, as well as some other extensions.

25 More from the paper that we don’t have time to discuss… Reducing program size below dictionary size (using  – Hiding from [CMS]) Reducing program size below dictionary size (using  – Hiding from [CMS]) Queries containing AND (using [BGN] machinery) Queries containing AND (using [BGN] machinery) Eliminating negligible error (using perfect hashing) Eliminating negligible error (using perfect hashing) Scheme based on arbitrary homomorphic encryption Scheme based on arbitrary homomorphic encryption

26 Conclusions Private searching on streaming data Private searching on streaming data Public key program obfuscation, more general than PIR Public key program obfuscation, more general than PIR Practical, efficient protocols Practical, efficient protocols Many open problems Many open problems

27 Thanks For Listening!


Download ppt "Private Keyword Search on Streaming Data Rafail Ostrovsky William Skeith UCLA (patent pending)"

Similar presentations


Ads by Google