Presentation is loading. Please wait.

Presentation is loading. Please wait.

Efficient Information Retrieval for Ranked Queries in Cost-Effective Cloud Environments Presenter: Qin Liu a,b Joint work with Chiu C. Tan b, Jie Wu b,

Similar presentations


Presentation on theme: "Efficient Information Retrieval for Ranked Queries in Cost-Effective Cloud Environments Presenter: Qin Liu a,b Joint work with Chiu C. Tan b, Jie Wu b,"— Presentation transcript:

1 Efficient Information Retrieval for Ranked Queries in Cost-Effective Cloud Environments Presenter: Qin Liu a,b Joint work with Chiu C. Tan b, Jie Wu b, and Guojun Wang a IEEE INFOCOM 2012, March Orlando, USA a Central South University, China b Temple University, USA

2 Introduction

3 Cloud Computing Model o Cloud computing as a new commercial paradigm enables users to outsource data to a cloud o Data is described by a set of keywords o Users retrieve files with a set of keywords F 1 : { A, B} F 2 : {B, D} F 3 : {C, D} A, B F1F1 F1F1 F2F2 F2F2 o Cloud will learn user’s search pattern and access pattern Cloud Bob …

4 Private search (Ostrovsky et al, CRYPTO 2005) F 1 : { A, B} F 2 : {B,D} F 3 : {C,D} Cloud o Given a public dictionary that contains all keywords, e.g., dictionary= Bob [1] [1] [0] [0] … F 1 F 2 0 NA [1] [1] [0] [0] Homomorphic encryption E(x)*E(y) = E(x+y) E(x)^y = E(x*y) A compressed version of all files key trick: map unmatched files to 0 F 1 NA F 1 F 2 F 3 F2F2 0 survival collisionsurvivalunmatched E(0)*E(0)=E(0+0)=E(0) E(0)^F 3 =E(0*F 3 )=E(0) E(F 2 )* E(0) =E(F 2 )

5 Problem: Cost Grows Linearly o Processing each query is expensive. Given n users, the cloud needs to execute n queries o Performance bottleneck o Cloud will return all matched files, even if a user is interested in smaller percentage o Waste bandwidth

6 Our Solutions: EIRQ Scheme o A proxy server (ADL) is introduced between the users and the cloud (trusted) o Aggregate user queries o Distribute searching results o Support ranked query Cloud ADL Efficient Information retrieval for Ranked Query …

7 Rank queries {A, B} Rank 0 {A, C} Rank 1 F1F1 F1F1 F1F1 F1F1 Mask matrix F1F1 F1F1 o Queries are classified into ranks o ADL constructs a mask matrix o Cloud filters a certain percentage of matched files Rank-0 query: 100% Rank-1 query: 50% Cloud Alice Bob ADL F2F2 F2F2 F2F2 F2F2 F3 F3F3 F3F3 … Challenges: the cloud o Cannot know which files are filtered/returned o Cannot know each queries’ rank F 1 : { A, B} F 2 : {B, D} F 3 : {C, D} F 3 is filtered with 50%

8 Scheme Description

9 Intuition of EIRQ o Key techniques: o Construct a mask matrix to protect query ranks o Filter files without knowing which files are filtered QueryGen Step 1: User ADLCloud Keywords, rank FileFilter File Recovery Matrix Construct Step 2: Step 4: Step 3: Mask matrix Buffer Certain percentage of files matching user keywords

10 Goal o Queries are classified into 0,1,…,r-1 ranks. o Rank-i query retrieves (1-i/r) percentage of matched files Files that match rank 0 queries Files that match rank 1 queries Files that match rank i queries Will not be filtered Filtered with probability 1/r Filtered with probability i/r … … …… The cloud o Cannot know which files are filtered/returned o Cannot know each queries’ rank

11 Construct Mask Matrix o ADL constructs a mask matrix that is encrypted with its publics key, and sends it to the cloud Cloud ADL A B C D [1] [0] …… {A, B} Rank 0 {A, C} Rank 1 Alice Bob Number of ranks, r=2 Number of keywords For a keyword: Number of 1s is determined by the rank of query it appears: r-i High rank takes over Ratio of 1s to r determines the probability of a file containing it to be returned: (r-i)/r High ratio takes over

12 Filter Files Cloud F 1 : { A, B} F 2 : {B, D} F 3 : {C, D} buffer ADL … A B C D [1] [0] …… The cloud chooses a random column for each file … F1 and F2 will be returned F3 will be filtered with 50% A file, matched rank i query, the probability to be filtered i/r For F 3 : 50% 50% E(0)*E(0)=E(0) E(0)^F 3 =E(0) E(1)^ F 3 =E(F 3 )

13 Evaluation

14 Setup o Our simulations are conducted with MATLAB R2010a, running on a local machine with an Intel Core 2 Duo E GHz CPU and 8 GB RAM. We summarize the parameters in Table.

15 Percentage of Returned Files o Queries are classified into 0 to 3 ranks o Rank-0: 100% o Rank-1: 75% o Rank-2: 50% o Rank-3: 25% o Our results: o Rank-0: 100% o Rank-1: 75% o Rank-2: 52% o Rank-3: 29%

16 Computation Cost o ADL: s s o EIRQ: s s

17 Communication Cost Communication cost o EIRQ works better when only a few users o 5 users in each rank, 4 common keywords o EIRQ : 439KB buffer o ADL: 834KB buffer

18 Conclusion 1 An ADL is introduced to avoid performance bottleneck of the cloud 2 EIRQ scheme allows the queries with higher rank to retrieve higher percentage of matched files 3 Our solution protects access pattern, search pattern, and rank privacy from the cloud

19 Thank you!

20 Background Ostrovsky Scheme Adversary Model System Model

21 System model o Users in the organization send queries to ADL Cloud ADL Users o ADL will aggregate user queries and query cloud with a combined query o Cloud will return the files matching the combined query to ADL o ADL distributes results to each user Organization

22 Adversary Model o ADL is assumed to be trusted by all users o Cloud is the only adversary o Honest but curious o Obey our schemes, but still want to know some additional information o Our goal is to protect from the cloud o Access pattern o Search pattern o Rank privacy: hiding the rank of each user query

23 Ostrovsky Scheme (CRYPTO 2005) Alice Cloud F1 : A, B F2 : B F3 : C Public dictionary: Alice’s keywords: A, B [1], [1], [0], [0], [0] Alice’s query is a string of 0s and 1s Encrypted using homomorphic encryption Let E() be encryption E(x)*E(y) = E(x+y) E(x)^y = E(x*y)

24 Ostrovsky Scheme (CRYPTO 2005) Cloud F1 : A, B F2 : B F3 : C [1], [1], [0], [0], [0] * [2][1] [0] [2] ^F1[1] ^F2 [0] ^F3 [2,2* F1] [1, 1*F2] The magic is that unmatched file F3 is processed to 0 Alice’s Buffer Alice’s query [0,0]

25 Ostrovsky Scheme (CRYPTO 2005) Alice Cloud Decrypts to obtain F2 directly F1 is obtained by dividing 2* F1 by 2 [2,2* F1] [1,1*F2], [0,0] The buffer size only relates to the number of matched files

26 Cloud Security o The cloud may leak user privacy o Searchable encryption o Will not reveal what the users are searching for (search pattern) o Will reveals whether two users are interested in the same files (access pattern) F1: {A, B} F2: {B} F3: {C} F1 F2 F1 F3 {A, C} {A, B} Cloud Alice Bob

27 Construction of EIRQ o Step 1. Each user runs the QueryGen algorithm to send keywords and query rank to the ADL Cloud Dictionary: 0~2 ranks: Rank 0: 100% Rank 1: 50%, Rank 2: 0% A, B, Rank 1 B, C, Rank 1 File 1: { A, B} File 2: {B} File 3: {C} Alice Bob ADL

28

29


Download ppt "Efficient Information Retrieval for Ranked Queries in Cost-Effective Cloud Environments Presenter: Qin Liu a,b Joint work with Chiu C. Tan b, Jie Wu b,"

Similar presentations


Ads by Google