All Your Queries are Belong to Us: The Power of File-Injection Attacks on Searchable Encryption Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University.

Slides:



Advertisements
Similar presentations
Attacking Cryptographic Schemes Based on Perturbation Polynomials Martin Albrecht (Royal Holloway), Craig Gentry (IBM), Shai Halevi (IBM), Jonathan Katz.
Advertisements

Technische Universität Ilmenau CCSW 2013 Sander Wozniak
Efficient Information Retrieval for Ranked Queries in Cost-Effective Cloud Environments Presenter: Qin Liu a,b Joint work with Chiu C. Tan b, Jie Wu b,
CMSC 414 Computer and Network Security Lecture 4 Jonathan Katz.
Fast Algorithms For Hierarchical Range Histogram Constructions
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
CMSC 414 Computer (and Network) Security Lecture 4 Jonathan Katz.
Introduction to Practical Cryptography Lecture 9 Searchable Encryption.
INTRODUCTION PROBLEM FORMULATION FRAMEWORK AND PRIVACY REQUIREMENTS FOR MRSE PRIVACY-PRESERVING AND EFFICIENT MRSE PERFORMANCE ANALYSIS RELATED WORK CONCLUSION.
CMSC 414 Computer and Network Security Lecture 6 Jonathan Katz.
1 Staleness vs.Waiting time in Universal Discrete Broadcast Michael Langberg California Institute of Technology Joint work with Jehoshua Bruck and Alex.
Turning Privacy Leaks into Floods: Surreptitious Discovery of Social Network Friendships Michael T. Goodrich Univ. of California, Irvine joint w/ Arthur.
CMSC 414 Computer and Network Security Lecture 4 Jonathan Katz.
A Designer’s Guide to KEMs Alex Dent
Asymmetric Cryptography part 1 & 2 Haya Shulman Many thanks to Amir Herzberg who donated some of the slides from
Introduction to Signcryption November 22, /11/2004 Signcryption Public Key (PK) Cryptography Discovering Public Key (PK) cryptography has made.
CMSC 414 Computer and Network Security Lecture 3 Jonathan Katz.
 Structured peer to peer overlay networks are resilient – but not secure.  Even a small fraction of malicious nodes may result in failure of correct.
Selective Sampling on Probabilistic Labels Peng Peng, Raymond Chi-Wing Wong CSE, HKUST 1.
CMSC 414 Computer and Network Security Lecture 3 Jonathan Katz.
Hashed Samples Selectivity Estimators for Set Similarity Selection Queries.
Cong Wang1, Qian Wang1, Kui Ren1 and Wenjing Lou2
Efficient Exact Similarity Searches using Multiple Token Orderings Jongik Kim 1 and Hongrae Lee 2 1 Chonbuk National University, South Korea 2 Google Inc.
Understanding and Predicting Graded Search Satisfaction Tang Yuk Yu 1.
Issues of Security with the Oswald-Aigner Exponentiation Algorithm Colin D Walter Comodo Research Lab, Bradford, UK Colin D Walter.
Using the Small-World Model to Improve Freenet Performance Hui Zhang Ashish Goel Ramesh Govindan USC.
Bug Localization with Machine Learning Techniques Wujie Zheng
Public Key Encryption with keyword Search Author: Dan Boneh Rafail Ostroversity Giovanni Di Crescenzo Giuseppe Persiano Presenter: 陳昱圻.
Shanti Bramhacharya and Nick McCarty. This paper deals with the vulnerability of RFIDs A Radio Frequency Identifier or RFID is a small device used to.
1 Common Secure Index for Conjunctive Keyword-Based Retrieval over Encrypted Data Peishun Wang, Huaxiong Wang, and Josef Pieprzyk: SDM LNCS, vol.
02/09/2010 Industrial Project Course (234313) Virtualization-aware database engine Final Presentation Industrial Project Course (234313) Virtualization-aware.
Privacy-preserving rule mining. Outline  A brief introduction to association rule mining  Privacy preserving rule mining Single party  Perturbation.
Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules S.D. Lee, David W. Cheung, Ben Kao The University of Hong.
Computer Science CSC 774 Adv. Net. Security1 Presenter: Tong Zhou 11/21/2015 Practical Broadcast Authentication in Sensor Networks.
Securing Passwords Against Dictionary Attacks Presented By Chad Frommeyer.
Pairwise Local Alignment and Database Search Csc 487/687 Computing for Bioinformatics.
August 30, 2004STDBM 2004 at Toronto Extracting Mobility Statistics from Indexed Spatio-Temporal Datasets Yoshiharu Ishikawa Yuichi Tsukamoto Hiroyuki.
Search Worms, ACM Workshop on Recurring Malcode (WORM) 2006 N Provos, J McClain, K Wang Dhruv Sharma
UC/Garbled Searchable Symmetric Encryption Kaoru Kurosawa Ibaraki University, Japan.
Computer System Design Lab 1 Inverted Index Based Multi-Keyword Public-key Searchable Encryption with Strong Privacy Guarantee Bing Wang * Wei Song *†
Bloom Cookies: Web Search Personalization without User Tracking Authors: Nitesh Mor, Oriana Riva, Suman Nath, and John Kubiatowicz Presented by Ben Summers.
Bandwidth-Efficient Continuous Query Processing over DHTs Yingwu Zhu.
Presentation for CDA6938 Network Security, Spring 2006 Timing Analysis of Keystrokes and Timing Attacks on SSH Authors: Dawn Xiaodong Song, David Wagner,
Searching Over Encrypted Data Charalampos Papamanthou ECE and UMIACS University of Maryland, College Park Research Supported By.
HANGMAN OPTIMIZATION Kyle Anderson, Sean Barton and Brandyn Deffinbaugh.
Mona: Secure Multi-Owner Data Sharing for Dynamic Groups in the Cloud.
Keyword search on encrypted data. Keyword search problem  Linux utility: grep  Information retrieval Basic operation Advanced operations – relevance.
P2P Networking: Freenet Adriane Lau November 9, 2004 MIE456F.
All Your Queries Are Belong to Us: The Power of File-Injection Attacks on Searchable Encryption Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University.
Cryptography Lecture 3 Arpita Patra © Arpita Patra.
@Yuan Xue CS 285 Network Security Block Cipher Principle Fall 2012 Yuan Xue.
Searchable Encryption in Cloud
Efficient Multi-User Indexing for Secure Keyword Search
Source: IEEE Signal Processing Letters (Accepted)2016
POLYGRAPH: Automatically Generating Signatures for Polymorphic Worms
Fast Searchable Encryption with Tunable Locality
Modern symmetric-key Encryption
Cryptography Lecture 9.
Digital Signature Schemes and the Random Oracle Model
Verifiable Oblivious Storage
Cryptography Lecture 16.
CMSC 414 Computer and Network Security Lecture 3
Paraskevi Raftopoulou, Euripides G.M. Petrakis
University of Maryland
Privacy preserving cloud computing
Cryptography Lecture 11.
Cryptography Lecture 9.
Cryptography Lecture 16.
Path Oram An Extremely Simple Oblivious RAM Protocol
Presentation transcript:

All Your Queries are Belong to Us: The Power of File-Injection Attacks on Searchable Encryption Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University of Maryland

Agenda Background on Searchable Encryption. Attacks on Searchable Encryption. Experimental results. Extensions to conjunctive Searchable Encryption. Countermeasures Conclusions.

system client Privacy? Search?

Searchable Encryption

What is Searchable Encryption? client server search query: keyword

Products Encrypted Storage: Skyhigh Networks, CipherCloud Encrypted s

Main Contributions Existing Searchable Encryption schemes give rigorous security proofs, by allowing well-defined “leakage”. The practical meaning of these leakages are not well understood. We present attacks that utilize these leakages to break the privacy. We suggest reducing or eliminating these leakages, instead of accepting them by default.

An Example of Searchable Encryption k1k1 k2k2 k3k3 142 F1F1 F2F2 F3F3 F4F4 F5F5 F6F

k1k1 k2k2 k3k3 142 F1F1 F2F2 F3F3 F4F4 F5F5 F6F k1k1 token

An Example of Searchable Encryption k1k1 k2k2 k3k3 F1F1 F2F2 F3F3 F4F4 F5F5 F6F6 F7F

Leakage

Leakage of Searchable Encryption k1k1 k2k2 k3k3 F1F1 F2F2 F3F3 F4F4 F5F5 F6F6 k1k1 deterministic! file access patterns! F7F7 7 search k 1 on new files!

Leakage of Searchable Encryption Search pattern leakage: can tell when query repeats. Access pattern leakage: can tell whether a file is returned. Leaked by all efficient searchable encryption schemes. No Forward Privacy: can search old tokens on new files. All SE schemes except [CM05, SPS14] do not have forward privacy.

What information does this leakage reveal?

Prior Attacks on Searchable Encryption Islam et al. (IKK12) proposed a query recovery attack. Cash et al. (CGPR15) proposed another attack with higher success probability. The server knows all the client’s files in plaintext. Our attacks: 1.In the file-injection attack model (First proposed in CGPR15). 2.Significantly improve the success probability. 3.Eliminate or relax the file leakage assumption. 4.Extends to conjunctive search.

Our Attacks

Attack Model: File-injection Attack clientserver search query: F1F1 F2F2 F3F3 k F4F4 F3F3 F4F4

Practicality of the Attack Model Gmail: → create fresh accounts → send s to existing accounts → s go through!

Binary Search Attack k0k0 k1k1 k2k2 k3k3 k4k4 k5k5 k6k6 k7k7 File 1: k0k0 k1k1 k2k2 k3k3 k4k4 k5k5 k6k6 k7k7 File 2: k0k0 k1k1 k2k2 k3k3 k4k4 k5k5 k6k6 k7k7 File 3: search result Only inject 14 files for a universe of 10,000 keywords. Can recover all queries with probability 1. Inject before seeing the queries (non-adaptive). Only use file access pattern leakage. Universe defined by the server (small universe).

Limitation Long injected files (|K|/2 keywords each).

Threshold Countermeasure Filter all files that contains more than T keywords. - Index only T most frequent keywords in a file that has more than T keywords. Enron data set: 30,109 files, universe of 5,000 keywords Only 3% of files have more than T=200 keywords. Enron dataset. Accessed:

Modifying the Attack |K|/2T files of T keywords each to replace 1 file with |K|/2 keywords. Hierarchical search (see our paper for details). Inject 131 files for |K|=5,000 and T=200. k0k0 k1k1 k2k2 k3k3 k4k4 k5k5 k6k6 k7k7 File 1: File 1File 2

Advanced Attacks

Attacks with Partial File Leakage The server learns a portion of client’s files in plaintext. (Announcement and alert s broadcasted to many people) Goal: bypass the threshold filter on the length of the injected files. Approach: reduce the size of the universe searched for tokens. (candidate universe)

Attacks to Recover 1 Token k1k1 k2k2 k3k3 universe of keywords estimated frequency f*(k 1 ) f*(k 2 ) f*(k 3 ) t f(t) k4k4 k5k5 f*(k 4 ) f*(k 5 ) token exact frequency candidate universe: f*(k)≈f(t) binary search attack

Pseudo Code

Difference from Our Prior Attacks 1.Adaptive. 2.Applies to SE schemes with no forward privacy, or token searched twice. 3.The server does not always succeed, but can determine whether attacks fail.

Attacks to Recover Multiple Tokens 1.Recover several keyword/token pairs as ground truth. 2.For a remaining token t ', every keyword k ', f*(k, k') ≈ f(t, t') for all pairs (k,t) in ground truth → put k’ into the candidate universe 3.Search.

Experiments

Experimental Methodology Enron data set with 30,109 s. Stem words in the s (remove -able, -ing etc.). Remove stop words (“to”, “you” etc.). Extract keywords (in total 77,000). Choose top 5,000 with highest frequency as the universe. Leaked files are uniformly chosen from all files. Queries are uniformly chosen from all keywords. Run the attack on 100 sets of queries (1 or 100) on 1 sample of leaked files, repeat for 100 different samples to report the average

Experimental Results: Recover 1 Query U = 5,000, T = 200, number of injected files = 9 different attack models!

Experimental Results: Recover 100 Queries U = 5,000, T = 200, number of injected files <= 40

Insights Prior attacks: find the best match between keywords and tokens. e.g. if 100% files are leaked, only 1 keyword satisfies f*(k)=f(t) then t must be k! uniqueness of the frequency: distorted when less files are leaked. Our attacks: rule out bad matches, search on the remaining ones.

Extensions to Conjunctive SE

Search files with d keywords k 1, k 2, … k d. Ideal leakage: only leak the intersection of their search results. (No existing scheme achieves ideal leakage.)

Attack Algorithm k3, k4, k6, k9, k10, … k2, k3, k5, k9, k11, … k1, k4, k6, k9, … k1, k4, k8, k12, k19, … …… Inject n files, each contains L keywords randomly and independently from the universe. Find the injected files that are in the search result of a conjunctive query. Take the intersection of their keywords. search k3, k4, k6, k9, k10, … k1, k4, k6, k9, … = {k4, k6, k9}

Proof Sketch

Other Attacks Two other attacks (Refer to our paper): A non-adaptive attack for 2-keyword query with probability 1. A adaptive attack for d-keyword query with probability 1, n=d log|K|

Countermeasures

Semantic Filter 1.Arbitrary set of half keywords (in binary search attack). 2.Arbitrary order. 3.Arbitrary number of occurrence. 4.Arbitrary form (-able, -ing etc.). 5.Stop words (“to”, “you” etc.). Does not work!

Padding Pad the inverted index s.t. multiple tokens have the same frequency. k1k1 k2k2 k3k3 F1F1 F2F2 F3F3 F4F4 F5F5 F6F

Padding Does not affect the binary search attack: Fail only when an injected file is selected to pad to a token. Does not affect the advanced attacks: Close frequencies are still close after padding.

Padding: Experiments Attacking 1 tokenAttacking 100 tokens β: average number of padded files / the original number of files in the search result

Padding β = 0.4β = 0.6 β = 0 β = 0.2

Potential Countermeasures File length padding. Partially works. 1.Storage overhead. E.g. in Enron data set, 1000x overhead. 2.Dynamic case: timing. Batched updates. Partially works injected file per batch: attacks succeed with some probability. 2. Repeat 1 injected file many times: attacks succeed with good probability.

Conclusions File-injection attacks are devastating for query privacy in SE. Is it a satisfactory tradeoff between efficiency and leakage for existing SE? Future research:  Reduce or eliminate access pattern leakage.  Exploring new directions such as multi-server schemes. Forward Privacy.

Thank you for listening!