Presentation is loading. Please wait.

Presentation is loading. Please wait.

All Your Queries are Belong to Us: The Power of File-Injection Attacks on Searchable Encryption Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University.

Similar presentations


Presentation on theme: "All Your Queries are Belong to Us: The Power of File-Injection Attacks on Searchable Encryption Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University."— Presentation transcript:

1 All Your Queries are Belong to Us: The Power of File-Injection Attacks on Searchable Encryption Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University of Maryland

2 Agenda Background on Searchable Encryption. Attacks on Searchable Encryption. Experimental results. Extensions to conjunctive Searchable Encryption. Countermeasures Conclusions.

3 Email system client Privacy? Search?

4 Searchable Encryption

5 What is Searchable Encryption? client server search query: keyword

6 Products Encrypted Storage: Skyhigh Networks, CipherCloud Encrypted Emails

7 Main Contributions Existing Searchable Encryption schemes give rigorous security proofs, by allowing well-defined “leakage”. The practical meaning of these leakages are not well understood. We present attacks that utilize these leakages to break the privacy. We suggest reducing or eliminating these leakages, instead of accepting them by default.

8 An Example of Searchable Encryption k1k1 k2k2 k3k3 142 F1F1 F2F2 F3F3 F4F4 F5F5 F6F6 3642 51

9 k1k1 k2k2 k3k3 142 F1F1 F2F2 F3F3 F4F4 F5F5 F6F6 3642 51 k1k1 token

10 An Example of Searchable Encryption k1k1 k2k2 k3k3 F1F1 F2F2 F3F3 F4F4 F5F5 F6F6 F7F7 7 142 3642 51

11 Leakage

12 Leakage of Searchable Encryption k1k1 k2k2 k3k3 F1F1 F2F2 F3F3 F4F4 F5F5 F6F6 k1k1 deterministic! file access patterns! F7F7 7 search k 1 on new files! 51 3642 142

13 Leakage of Searchable Encryption Search pattern leakage: can tell when query repeats. Access pattern leakage: can tell whether a file is returned. Leaked by all efficient searchable encryption schemes. No Forward Privacy: can search old tokens on new files. All SE schemes except [CM05, SPS14] do not have forward privacy.

14 What information does this leakage reveal?

15 Prior Attacks on Searchable Encryption Islam et al. (IKK12) proposed a query recovery attack. Cash et al. (CGPR15) proposed another attack with higher success probability. The server knows all the client’s files in plaintext. Our attacks: 1.In the file-injection attack model (First proposed in CGPR15). 2.Significantly improve the success probability. 3.Eliminate or relax the file leakage assumption. 4.Extends to conjunctive search.

16 Our Attacks

17 Attack Model: File-injection Attack clientserver search query: F1F1 F2F2 F3F3 k F4F4 F3F3 F4F4

18 Practicality of the Attack Model Gmail: → create fresh accounts → send emails to existing accounts → emails go through!

19 Binary Search Attack k0k0 k1k1 k2k2 k3k3 k4k4 k5k5 k6k6 k7k7 File 1: k0k0 k1k1 k2k2 k3k3 k4k4 k5k5 k6k6 k7k7 File 2: k0k0 k1k1 k2k2 k3k3 k4k4 k5k5 k6k6 k7k7 File 3: search result 0 1 0 Only inject 14 files for a universe of 10,000 keywords. Can recover all queries with probability 1. Inject before seeing the queries (non-adaptive). Only use file access pattern leakage. Universe defined by the server (small universe).

20 Limitation Long injected files (|K|/2 keywords each).

21 Threshold Countermeasure Filter all files that contains more than T keywords. - Index only T most frequent keywords in a file that has more than T keywords. Enron data set: 30,109 files, universe of 5,000 keywords Only 3% of files have more than T=200 keywords. Enron email dataset. https://www.cs.cmu.edu/~./enron/. Accessed: 2015-12-14.https://www.cs.cmu.edu/~./enron/

22 Modifying the Attack |K|/2T files of T keywords each to replace 1 file with |K|/2 keywords. Hierarchical search (see our paper for details). Inject 131 files for |K|=5,000 and T=200. k0k0 k1k1 k2k2 k3k3 k4k4 k5k5 k6k6 k7k7 File 1: File 1File 2

23 Advanced Attacks

24 Attacks with Partial File Leakage The server learns a portion of client’s files in plaintext. (Announcement and alert emails broadcasted to many people) Goal: bypass the threshold filter on the length of the injected files. Approach: reduce the size of the universe searched for tokens. (candidate universe)

25 Attacks to Recover 1 Token k1k1 k2k2 k3k3 universe of keywords estimated frequency f*(k 1 ) f*(k 2 ) f*(k 3 ) t f(t) k4k4 k5k5 f*(k 4 ) f*(k 5 ) token exact frequency candidate universe: f*(k)≈f(t) binary search attack

26 Pseudo Code

27 Difference from Our Prior Attacks 1.Adaptive. 2.Applies to SE schemes with no forward privacy, or token searched twice. 3.The server does not always succeed, but can determine whether attacks fail.

28 Attacks to Recover Multiple Tokens 1.Recover several keyword/token pairs as ground truth. 2.For a remaining token t ', every keyword k ', f*(k, k') ≈ f(t, t') for all pairs (k,t) in ground truth → put k’ into the candidate universe 3.Search.

29 Experiments

30 Experimental Methodology Enron data set with 30,109 emails. Stem words in the emails (remove -able, -ing etc.). Remove stop words (“to”, “you” etc.). Extract keywords (in total 77,000). Choose top 5,000 with highest frequency as the universe. Leaked files are uniformly chosen from all files. Queries are uniformly chosen from all keywords. Run the attack on 100 sets of queries (1 or 100) on 1 sample of leaked files, repeat for 100 different samples to report the average

31 Experimental Results: Recover 1 Query U = 5,000, T = 200, number of injected files = 9 different attack models!

32 Experimental Results: Recover 100 Queries U = 5,000, T = 200, number of injected files <= 40

33 Insights Prior attacks: find the best match between keywords and tokens. e.g. if 100% files are leaked, only 1 keyword satisfies f*(k)=f(t) then t must be k! uniqueness of the frequency: distorted when less files are leaked. Our attacks: rule out bad matches, search on the remaining ones.

34 Extensions to Conjunctive SE

35 Search files with d keywords k 1, k 2, … k d. Ideal leakage: only leak the intersection of their search results. (No existing scheme achieves ideal leakage.)

36 Attack Algorithm k3, k4, k6, k9, k10, … k2, k3, k5, k9, k11, … k1, k4, k6, k9, … k1, k4, k8, k12, k19, … …… Inject n files, each contains L keywords randomly and independently from the universe. Find the injected files that are in the search result of a conjunctive query. Take the intersection of their keywords. search k3, k4, k6, k9, k10, … k1, k4, k6, k9, … = {k4, k6, k9}

37 Proof Sketch

38 Other Attacks Two other attacks (Refer to our paper): A non-adaptive attack for 2-keyword query with probability 1. A adaptive attack for d-keyword query with probability 1, n=d log|K|

39 Countermeasures

40 Semantic Filter 1.Arbitrary set of half keywords (in binary search attack). 2.Arbitrary order. 3.Arbitrary number of occurrence. 4.Arbitrary form (-able, -ing etc.). 5.Stop words (“to”, “you” etc.). Does not work!

41 Padding Pad the inverted index s.t. multiple tokens have the same frequency. k1k1 k2k2 k3k3 F1F1 F2F2 F3F3 F4F4 F5F5 F6F6 3 62 142 3642 51

42 Padding Does not affect the binary search attack: Fail only when an injected file is selected to pad to a token. Does not affect the advanced attacks: Close frequencies are still close after padding.

43 Padding: Experiments Attacking 1 tokenAttacking 100 tokens β: average number of padded files / the original number of files in the search result

44 Padding β = 0.4β = 0.6 β = 0 β = 0.2

45 Potential Countermeasures File length padding. Partially works. 1.Storage overhead. E.g. in Enron data set, 1000x overhead. 2.Dynamic case: timing. Batched updates. Partially works. 1. 1 injected file per batch: attacks succeed with some probability. 2. Repeat 1 injected file many times: attacks succeed with good probability.

46 Conclusions File-injection attacks are devastating for query privacy in SE. Is it a satisfactory tradeoff between efficiency and leakage for existing SE? Future research:  Reduce or eliminate access pattern leakage.  Exploring new directions such as multi-server schemes. Forward Privacy.

47 Thank you for listening!


Download ppt "All Your Queries are Belong to Us: The Power of File-Injection Attacks on Searchable Encryption Yupeng Zhang, Jonathan Katz, Charalampos Papamanthou University."

Similar presentations


Ads by Google