Private Keyword Search on Streaming Data Rafail Ostrovsky William Skeith UCLA (patent pending)

Slides:



Advertisements
Similar presentations
Private Inference Control David Woodruff MIT Joint work with Jessica Staddon (PARC)
Advertisements

Private Inference Control
Efficient Private Approximation Protocols Piotr Indyk David Woodruff Work in progress.
Oblivious Branching Program Evaluation
An Ω(n 1/3 ) Lower Bound for Bilinear Group Based Private Information Retrieval Alexander Razborov Sergey Yekhanin.
Efficient Information Retrieval for Ranked Queries in Cost-Effective Cloud Environments Presenter: Qin Liu a,b Joint work with Chiu C. Tan b, Jie Wu b,
ECE454/CS594 Computer and Network Security Dr. Jinyuan (Stella) Sun Dept. of Electrical Engineering and Computer Science University of Tennessee Fall 2011.
Building an Encrypted and Searchable Audit Log Brent Waters Dirk Balfanz Glenn Durfee D.K. Smetters.
CIS 5371 Cryptography 3b. Pseudorandomness.
22C:19 Discrete Structures Integers and Modular Arithmetic
Introduction to Practical Cryptography Lecture 9 Searchable Encryption.
Copyright Justin Klein Keane InfoSec Training Encryption.
How to compile searching software so that it is impossible to reverse-engineer. (Private Keyword Search on Streaming Data)
Private Information Retrieval Benny Chor, Oded Goldreich, Eyal Kushilevitz and Madhu Sudan Journal of ACM Vol.45 No Reporter : Chen, Chun-Hua Date.
Secure communications Week 10 – Lecture 2. To summarise yesterday Security is a system issue Technology and security specialists are part of the system.
CMSC 414 Computer and Network Security Lecture 2 Jonathan Katz.
How to compile searching software so that it is impossible to reverse-engineer. (Private Keyword Search on Streaming Data)
Foundations of Network and Computer Security J J ohn Black Lecture #3 Aug 28 th 2009 CSCI 6268/TLEN 5550, Fall 2009.
How cryptography is used to secure web services Josh Benaloh Cryptographer Microsoft Research.
Private Information Retrieval Amos Beimel – Ben-Gurion University Tel-Hai, June 4, 2003 This talk is based on talks by:
1Bloom Filters Lookup questions: Does item “ x ” exist in a set or multiset? Data set may be very big or expensive to access. Filter lookup questions with.
CMSC 414 Computer and Network Security Lecture 2 Jonathan Katz.
Public Key Encryption that Allows PIR Queries Dan Boneh 、 Eyal Kushilevitz 、 Rafail Ostrovsky and William E. Skeith Crypto 2007.
Efficient Consistency Proofs for Generalized Queries on a Committed Database R. Ostrovsky C. Rackoff A. Smith UCLA Toronto.
Cramer-Shoup is Plaintext Aware in the Standard Model Alexander W. Dent Information Security Group Royal Holloway, University of London.
CMSC 414 Computer and Network Security Lecture 3 Jonathan Katz.
Practical Techniques for Searches on Encrypted Data Yongdae Kim Written by Song, Wagner, Perrig.
Homomorphic Encryption: WHAT, WHY, and HOW
J.H.Saltzer, D.P.Reed, C.C.Clark End-to-End Arguments in System Design Reading Group 19/11/03 Torsten Ackemann.
One-Time Pad Or Vernam Cipher Sayed Mahdi Mohammad Hasanzadeh Spring 2004.
©The McGraw-Hill Companies, Inc., 2000© Adapted for use at JMU by Mohamed Aboutabl, 2003Mohamed Aboutabl1 1 Chapter 29 Internet Security.
Distributing a Classified Search* Rafail Ostrovsky William Skeith Stealth Software Technologies, LLC.
A Linear Lower Bound on the Communication Complexity of Single-Server PIR Weizmann Institute of Science Israel Iftach HaitnerJonathan HochGil Segev.
Identity-Based Secure Distributed Data Storage Schemes.
Introduction1-1 Data Communications and Computer Networks Chapter 6 CS 3830 Lecture 31 Omar Meqdadi Department of Computer Science and Software Engineering.
Cryptography Wei Wu. Internet Threat Model Client Network Not trusted!!
Shambhu Upadhyaya Security – AES-CCMP Shambhu Upadhyaya Wireless Network Security CSE 566 (Lecture 13)
Public Key Encryption with keyword Search Author: Dan Boneh Rafail Ostroversity Giovanni Di Crescenzo Giuseppe Persiano Presenter: 陳昱圻.
Confidentiality Confidentiality is maintained so long as private keys are secure. Authenticity is possible via public-key encryption by encrypting messages.
Upper OSI Layers Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.
Lecture 2: Introduction to Cryptography
Security Many secure IT systems are like a house with a locked front door but with a side window open -somebody.
Secure Conjunctive Keyword Search Over Encrypted Data Philippe Golle Jessica Staddon Palo Alto Research Center Brent Waters Princeton University.
Secure Computation Lecture Arpita Patra. Recap >> Improving the complexity of GMW > Step I: Offline: O(n 2 c AND ) OTs; Online: i.t., no crypto.
FHE Introduction Nigel Smart Avoncrypt 2015.
Algebraic Lower Bounds for Computing on Encrypted Data Rafail Ostrovsky William E. Skeith III.
多媒體網路安全實驗室 Practical Searching Over Encrypted Data By Private Information Retrieval Date: Reporter: Chien-Wen Huang 出處: GLOBECOM 2010, 2010 IEEE.
Bloom Cookies: Web Search Personalization without User Tracking Authors: Nitesh Mor, Oriana Riva, Suman Nath, and John Kubiatowicz Presented by Ben Summers.
Introduction to Obfuscation Mohammad Mahmoody University of Virginia *some slides borrowed from abhi shelat.
Private Information Retrieval Based on the talk by Yuval Ishai, Eyal Kushilevitz, Tal Malkin.
Keyword search on encrypted data. Keyword search problem  Linux utility: grep  Information retrieval Basic operation Advanced operations – relevance.
Privacy and Security Topics From Greenlaw/Hepp, In-line/On-line: Fundamentals of the Internet and the World Wide Web 1 Introduction Known Information Software.
All Your Queries are Belong to Us: The Power of File-Injection Attacks on Searchable Encryption Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University.
Big Data Security Issues in Cloud Management. BDWG Big Data Working Group Researchers 1: Data analytics for security 2: Privacy preserving 3: Big data-scale.
Cryptographic methods. Outline  Preliminary Assumptions Public-key encryption  Oblivious Transfer (OT)  Random share based methods  Homomorphic Encryption.
CMSC 414 Computer and Network Security Lecture 2 Jonathan Katz.
Homework #1 J. H. Wang Oct. 9, 2012.
Searchable Encryption in Cloud
POLYGRAPH: Automatically Generating Signatures for Polymorphic Worms
The Variable-Increment Counting Bloom Filter
Presented by: Dr. Munam Ali Shah
Perfect Non-interactive Zero-Knowledge for NP
Cryptography Lecture 12.
Verifiable Oblivious Storage
Rishab Goyal Venkata Koppula Brent Waters
(Private Keyword Search on Streaming Data)
Cryptography Lecture 11.
Cryptography Lecture 12.
Cryptography Lecture 11.
Simple Hash Functions Network Security.
Presentation transcript:

Private Keyword Search on Streaming Data Rafail Ostrovsky William Skeith UCLA (patent pending)

Motivating Example The intelligence community collects data from multiple sources that might potentially be “useful” for future analysis. The intelligence community collects data from multiple sources that might potentially be “useful” for future analysis. Network traffic Network traffic Chat rooms Chat rooms Web sites, etc… Web sites, etc… However, what is “useful” is often classified. However, what is “useful” is often classified.

Current Practice Continuously transfer all data to a secure environment. Continuously transfer all data to a secure environment. After data is transferred, filter in the classified environment, keep only small fraction of documents. After data is transferred, filter in the classified environment, keep only small fraction of documents.

¢¢¢! D (1,3) ! D (1,2) ! D (1,1) ! ¢¢¢! D (2,3) ! D (2,2) ! D (2,1) ! ¢¢¢! D (3,3) ! D (3,2) ! D (3,1) ! Classified Environment FilterStorage D (3,1) D (1,1) D (1,2) D (2,2) D (2,3) D (3,2) D (2,1) D (1,3) D (3,3) Filter rules are written by an analyst and are classified!

Current Practice Drawbacks: Drawbacks: Communication Communication Processing Processing

How to improve performance? Distribute work to many locations on a network Distribute work to many locations on a network Seemingly ideal solution, but… Seemingly ideal solution, but… Major problem: Major problem: Not clear how to maintain privacy, which is the focus of this talk Not clear how to maintain privacy, which is the focus of this talk

¢¢¢! D (1,3) ! D (1,2) ! D (1,1) ! ¢¢¢! D (2,3) ! D (2,2) ! D (2,1) ! ¢¢¢! D (3,3) ! D (3,2) ! D (3,1) ! Classified Environment Filter Storage E (D (1,2) ) E (D (1,3) ) E (D (1,3) ) Filter Storage E (D (2,2) ) Filter Storage Decrypt Storage D (1,2) D (1,3) D (2,2)

Example Filter: Example Filter: Look for all documents that contain special classified keywords, selected by an analyst Look for all documents that contain special classified keywords, selected by an analyst Perhaps an alias of a dangerous criminal Perhaps an alias of a dangerous criminal Privacy Privacy Must hide what words are used to create the filter Must hide what words are used to create the filter Output must be encrypted Output must be encrypted

More generally: We define the notion of Public Key Program Obfuscation We define the notion of Public Key Program Obfuscation Encrypted version of a program Encrypted version of a program Performs same functionality as un-obfuscated program, but: Performs same functionality as un-obfuscated program, but: Produces encrypted output Produces encrypted output Impossible to reverse engineer Impossible to reverse engineer A little more formally: A little more formally:

Public Key Program Obfuscation

Privacy

Related Notions PIR (Private Information Retrieval) [CGKS],[KO],[CMS]… PIR (Private Information Retrieval) [CGKS],[KO],[CMS]… Keyword PIR [KO],[CGN],[FIPR] Keyword PIR [KO],[CGN],[FIPR] Program Obfuscation [BGIRSVY]… Program Obfuscation [BGIRSVY]… Here output is identical to un-obfuscated program, but in our case it is encrypted. Here output is identical to un-obfuscated program, but in our case it is encrypted. Public Key Program Obfuscation Public Key Program Obfuscation A more general notion than PIR, with lots of applications A more general notion than PIR, with lots of applications

What we want ¢¢¢! D (1,3) ! D (1,2) ! D (1,1) ! Filter Storage

This is matching document #2 This is a Non- matching document This is matching document #1 This is matching document #3 This is a Non- matching document

How to accomplish this?

Several Solutions based on Homomorphic Encryptions For this talk: Paillier Encryption For this talk: Paillier Encryption Properties: Properties: Plaintext set = Z n Plaintext set = Z n Ciphertext set = Z * n 2 Ciphertext set = Z * n 2 Homomorphic, i.e., E(x)E(y) = E(x+y) Homomorphic, i.e., E(x)E(y) = E(x+y)

Simplifying Assumptions for this Talk All keywords come from some poly-size dictionary All keywords come from some poly-size dictionary Truncate documents beyond a certain length Truncate documents beyond a certain length

w t-2 E(1) w t-1 E(0) wtwtwtwt w1w1w1w1 w2w2w2w2 E(1) w3w3w3w3 E(0) w4w4w4w4 w5w5w5w5 E(1) D E(0 ) (g,g D ) ¤=¤=¤=¤=¤=¤= Dictionary Output Buffer

This is matching document #1 This is matching document #3 This is matching document #2 Here’s another matching document Collisions cause two problems: 1.Good documents are destroyed 2. Non-existent documents could be fabricated

We’ll make use of two combinatorial lemmas… We’ll make use of two combinatorial lemmas…

How to detect collisions? Append a highly structured, (yet random) k-bit string to the message Append a highly structured, (yet random) k-bit string to the message The sum of two or more such strings will be another such string with negligible probability in k The sum of two or more such strings will be another such string with negligible probability in k Specifically, partition k bits into triples of bits, and set exactly one bit from each triple to 1 Specifically, partition k bits into triples of bits, and set exactly one bit from each triple to 1

100|001|100|010|010|100|001|010| |001|010|001|100|001|100|001| |100|100|100|010|001|010|001| |100|010|111|100|100|111|010|010 =

Detecting Overflow > m Double buffer size from m to 2m Double buffer size from m to 2m If m < #documents < 2m, output “overflow” If m < #documents < 2m, output “overflow” If #documents > 2m, then expected number of collisions is large, thus output “overflow” in this case as well. If #documents > 2m, then expected number of collisions is large, thus output “overflow” in this case as well. Not yet in eprint version, will appear soon, as well as some other extensions. Not yet in eprint version, will appear soon, as well as some other extensions.

More from the paper that we don’t have time to discuss… Reducing program size below dictionary size (using  – Hiding from [CMS]) Reducing program size below dictionary size (using  – Hiding from [CMS]) Queries containing AND (using [BGN] machinery) Queries containing AND (using [BGN] machinery) Eliminating negligible error (using perfect hashing) Eliminating negligible error (using perfect hashing) Scheme based on arbitrary homomorphic encryption Scheme based on arbitrary homomorphic encryption

Conclusions Private searching on streaming data Private searching on streaming data Public key program obfuscation, more general than PIR Public key program obfuscation, more general than PIR Practical, efficient protocols Practical, efficient protocols Many open problems Many open problems

Thanks For Listening!