Secure and efficient data sharing on encrypted cloud relational databases.

1 Secure and efficient data sharing on encrypted cloud relational databases

2 Introduction (Relational)-cloud databases are welcomed Service provider (SP) User Item_IDCostWholesale_price 10761020 33081550 Store data on cloud Get back a data item Item_IDCostWholesale_price 10761020

3 Encryption for security Due to security concern, data is encrypted before storing on cloud Service provider (SP) User Item_IDCostWholesale_price Egask5A42fgs2S46Dg asD3j64139ASsDd3fj2 Store data on cloud Get back a data item Item_IDCostWholesale_price 10761020 Key is kept by user, but not SP!

4 The problem of data sharing Item_IDCostWholesale_price Egask5A42fgs2S46Dg asD3j64139ASsDd3fj2 AliceSPBob Bob is my business partner, I want to let him know the wholesale price of some of my selected products. Requirements: 1.Shared data should be revealed to only Bob (but not SP). 2.Other unshared data should remain unknown to both Bob and SP 3.Cost to Alice should be low (while cost to Bob and SP should be affordable) Alice’s data

5 Application of data sharing 1.Alice is a company user of SP. Now, Alice hires Bob, who is a data analytics expert to perform analysis. Alice has to share some of her data with Bob 2.Alice and Bob are two business partners. They share some data for gaining advantages, e.g., more market information.

6 Naïve solution of data sharing (E.g., CryptDB, TrustedDB) Encryption: Use an existing general encryption function, e.g., RSA with padding, to encrypt all data c = E(p, k) – Ciphertext: c – Plaintext: p – (Public) Key: k – Encryption function: E Item_IDCostWholesale_price Egask5A42fgs2S46Dg asD3j64139ASsDd3fj2

7 Naïve solution of data sharing - cont Wholesale_price 2S46Dg Item_IDCostWholesale_price Egask5A42fgs2S46Dg asD3j64139ASsDd3fj2 AliceSPBob Share Wholesale_price of Item “Egask5” Alice sends Bob a copy of the key On request, SP sends Bob the shared item Access control is enforced to prevent Bob from seeing unauthorized items Item_IDCostWholesale_price 10761020 33081550 This solution is not secure!

8 Another naïve solution Item_IDCostWholesale_price Egask5A42fgs2S46Dg asD3j64139ASsDd3fj2 AliceSPBob Wholesale_price 2S46Dg Wholesale_price 20 Wholesale_price 20 Alice downloads the items to be shared and decrypts them Send Bob the plain data Bob either stores the data on his own or inserts them to cloud like new tuples High processing cost to Alice

9 Problem definition Data: relational data – A table R contains T: a set of tuples C is a set of columns (attributes) – Each tuple t has exactly m values (NULL is also a value) Format of data for sharing: – C S : a subset of C – T S : a subset of T – Just like the result of a query ABC a1b1c1 a2b2c2 BC b2c2 T = {t 1, t 2 } C = {“A”, “B”, “C”} t 1 = {a1, b1, c1} t 2 = {a2, b2, c2} T S = {t2} C S = { “B”, “C”} t2 = {b2}

10 Models 3 parties: Alice, Bob, SP – Relationship: refer to introduction Attack model – Bob and SP are semi-honest and colluding Bob and SP are functioning as normal An attacker observes everything seen by Bob and SP Requirement: – The attacker cannot any plain data of Alice except for those are shared with Bob

11 Solution framework The solution includes: – An encryption method (KeyGen, Enc, Dec) – Sharing method (Share, SDec) Alice Bob SP 1. k = KeyGen() 2. c = Enc(p, k) ABC c a1 c b1 c c1 c a2 c b2 c c2 2. p = Dec(c, k) 3. H = Share(C S, T S, k) 4. p = SDec(c, H)

12 Our solution: Relational-based encryption (RBE) Problem of using general encryption, e.g., RSA – The same key is required to decrypt all encrypted values – In order to let Bob decrypt one particular data item, the decryption key must be sent to Bob – Overpowered Bob can now decrypt any data encrypted by Alice

13 Relational-based encryption (RBE) Idea: How about having each individual data item encrypted by a unique value key? ABC a1b1c1 a2b2c2 ABC k a1 k b1 k c1 k a2 k b2 k c2 ABC c a1 c b1 c c1 c a2 c b2 c c2 + Plain valuesValue key tableEncrypted values To share b1 Give k b1 to Bob Bob can only decrypt c b1, other values are safe since Bob does not have other value keys However, Alice has to remember all value keys, it will be a high storage cost

14 Key abstraction Each cell can be located by column identifier and row identifier Each tuple has a tuple secret rid; each column has a column secret cid Use one-way hash function – k = h(rid, cid) Storage cost at Alice: O(mn) => O(m+n) ABC t1t1 k a1 k b1 k c1 t2t2 k a2 k b2 k c2 t 1, A  k a1 t 2, C  k c2 m: number of columns n: number of tuples

15 Towards O(1) storage cost at Alice Use an existing encryption function – E: encryption function – D: Decryption function Tuple secrets and column secrets are encrypted and are stored at SP ABC E(cid A )E(cid B )E(cid C ) E(rid 1 )c a1 c b1 c c1 E(rid 2 )c a2 c b2 c c2

16 Encryption/decryption process Alice first gets back E(cid) and E(rid) of the value to be encrypted/decrypted – Decrypt and get cid and rid – Get the value key of the cell and encrypt/decrypt the cell Although it may seem to have a higher encryption/decryption cost now, RBE is more efficient for relational data – more details after the math details

17 Details in math KeyGen – Just the same key generation as the underlying encryption scheme Enc – Tuple t i = – Obtain cid and rid – c i = p i XOR h(rid XOR cid) h: one-way hash

18 Details in math Dec – Encrypted tuple t’ i = – Obtain cid and rid – p i = c i XOR h(rid XOR cid)

19 Correctness of encryption p i = c i XOR h(rid XOR cid) --- (1) c i = p i XOR h(rid XOR cid) --- (2) Sub. (2) into RHS of (1) c i XOR h(rid XOR cid) = p i XOR h(rid XOR cid) XOR h(rid XOR cid) = p i

20 Security Encrypted data is stored at cloud, is it safe? c i = p i XOR h(rid XOR cid) One time pad: p XOR k Note: the same key cannot be used to encrypt two or more data items! One time pad is perfectly secure Not breakable unless the key is leaked One-way hash function: not reversible Knowing the hash value cannot derive the input to hash (rid XOR cid) – an important feature to guard against CPA-style attack Overall: As secure as the hash function There are tons of highly secure one-way hash function, including those encryption functions of different encryption schemes

21 Security - cont On the other hand, cid and rid can be derived from CN (column name) and E(rid, k) Imagine they are encrypted values of the underlying encryption function (E, D), the security is the same as underlying scheme

22 Efficiency Decrypting a query result with n tuples and m columns – Traditional method, e.g., RSA, mn decryptions In our scheme – m+n decryptions, mn hashes, 2mn XOR operations Cost of decryption >> hash >> XOR

23 Data sharing Input: T S, C S – Alice sends the rid of each tuple in T S to Bob – Alice sends the cid of each column in C S to Bob H = = Share(T S, C S, k) – H T = {rid | rid of t and t in T S } – H C = {cid | cid of c and c in C S } Decryption: SDec(c, H) – Find corresponding rid and cid of c p i = c i XOR h(rid XOR cid)

24 Security Revealing some values of cid and rid Cells that are not related – of course secure Cells knowing its cid but not rid, secure?

25 Secure c i = p i XOR h(rid XOR cid) Note: the above already assumed Bob and SP are colluding – Otherwise, Bob has no access to encrypted values of other data Unknown hash input due to unknown rid or cid The hash value is unknown then

26 Problem of multiple sharing Users collusion User retrieves different shared versions at different time 1 st sharing 2 nd sharing Additional information that can be observed combining both sharing instances

27 Introduction – ECC: Operations are defined on 2D but finite points Advanced solution Ecliptic curve cryptography (ECC) y 2 mod p = x 3 – x mod py 2 mod p = x 3 – x + 1 mod p p: system parameter

28 Operations on ECC “Addition” Scalar multiplication – kP = P + P + … + P P -2P 2P

29 Operations on ECC Order of curve – Number of points on the curve – Let n be the order of curve (n+1)P = P for all P Curve with prime order, i.e., n is prime – There is integer k s.t., kP = Q for any point P, Q (P != 0) Elliptic curve discrete logarithm problem (ECDLP) – Given P, Q, it is hard to find k s.t. kP = Q Pairing function e: – e(aP, bQ) = e(P, Q) ab – Security: Bilinear Diffie-Hellman (BDH) assumption Given P, aP, bP, cP, it is hard to find e(P, P) abc

30 Improvement over our sharing scheme Recall: Encryption: c i = p i XOR h(rid, cid) Decryption: p i = c i XOR h(rid, cid) Share: Return all concerned rid and cid – Define h(rid, cid) = e(rid P, cid Q) P, Q are private (even if they are public, it is fine.)

31 Sharing Protocol Share – Alice generates a random r – Return {(r -1 *rid)*P} {(r*cid) Q}

32 Bob’s decryption Protocol SDec – Bob has X =(r -1 *rid)* P Y = (r*cid) Q – Computing g(X, Y) = h 2 (e(X, Y)) = h 2 (e((r -1 *rid) P), (r*cid) Q)) = h 2 (e(rid P, cid Q)) Recall: h(rid, cid) = h 2 (e(rid P, cid Q))

33 Security in multiple sharing Focus on columns, the case for rows is similar 1 st sharing 2 nd sharing r 1 cid A Qr 1 cid B Qr 1 cid C Q r 2 cid B Qr 2 cid C Q The values of rid and cid are contained in different sharing instances, is it a concern?

34 Question: is it secure? If we can find e(rid 2 P, cid A P)…, we can solve BDH problem (let Q = P for now) – Given P, aP, bP, cP, find e(P, P) abc In our case – a = cid A – b = r 2 -1 * rid 2 – c = r 2 – Generate random unrelated parameters rid 1, cid B, r 1 r 1 cid A Pr 1 cid B P r 2 cid B P r 1 -1 *rid 1 P r 2 -1 *rid 2 P AB C Any values combination of a, b, c can be expressed in this way r 1 A r 1 cid B P r 1 -1 * rid 1 P B cid B C

35 Security in multiple rows, columns? r 1 cid A Pr 1 cid B P r 2 cid B P r 1 -1 *rid 1 P r 2 -1 *rid 2 P r 1 A r 1 cid B P r 1 -1 * rid 1 P B cid B C r 1 -1 *rid i P r 2 cid C P cid C C Our security proof is for general case

36 Selecting tuples for sharing It is a fundamental problem that how the user defines what data to share with a particular party Select tuple with user’s free choice – Requires at least linear cost (to number of tuples) Another option – Define by query

37 Pre-computation for sharing by query AliceSPBob QQ RR Alice issues a query to define the data to be shared with Bob Alice prepares an index-like pre- computed information and gives it to SP HH Shared DB R is related to the query answer and index A hint H is generated based on R Bob can observe the shared data with the hint and the index at SP

38 Solution framework The solution includes: – An encryption method (KeyGen, Enc, Dec, BuildTree) – Sharing method (SQuery, Share, SDec) Alice Bob SP 1. k = KeyGen() 2. c = Enc(p, k) ABC c a1 c b1 c c1 c a2 c b2 c c2 2. p = Dec(c, k) 5. H = Share(C S,Φ, k) 6. p = SDec(c, Δ, H) 3. Δ = BuildTree() 4. Φ = SQuery(q)

39 Extending basic scheme Encrypted tuple secrets in a tree E i (rid 1, k 12 )E i (rid 2, k 12 )E s (k 12 ) t3t3 t4t4 t5t5 t6t6 t7t7 t8t8 Leaf level E i (k 12, k 14 )E i (k 34, k 14 )E s (k 14 ) E i (k 14, k 18 )E i (k 58, k 18 )E s (k 18 ) t1t1 t2t2 Keys for E s are kept at Alice only

40 Computing the answer of a query SQuery(q) E i (rid 1, k 12 )E i (rid 2, k 12 )E s (k 12 ) t3t3 t4t4 t5t5 t6t6 t7t7 t8t8 Leaf level E i (k 12, k 14 )E i (k 34, k 14 )E s (k 14 ) E i (k 14, k 18 )E i (k 58, k 18 )E s (k 18 ) t1t1 t2t2 Answers Returned to Alice

41 Share (C S, Φ, k) Φ = {E s (k 14 )} H = = Share(C S, Φ, k) – H T = {k 14 } – H C = {cid | cid of c and c in C S }

42 Computing the answer of a query Bob’s knowledge: k 14 E i (rid 1, k 12 )E i (rid 2, k 12 )E s (k 12 ) t3t3 t4t4 t5t5 t6t6 t7t7 t8t8 Leaf level E i (k 12, k 14 )E i (k 34, k 14 )E s (k 14 ) E i (k 14, k 18 )E i (k 58, k 18 )E s (k 18 ) t1t1 t2t2 k 12 k 34 E s (k 14 ) rid 1 rid 2 E s (k 12 ) Tuple secrets of t 1 to t 4 Remain unknown

43 Advantage of using index Without index, cost to Alice must be at least linear to number of tuples in the sharing domain – Now, it is linear to number of nodes returned in the tree, which is usually much smaller

44 Indexing scheme for multi-sharing scenario Use a different function to generate the value key t1t1 h1h1 t2t2 h2h2 t3t3 h3h3 t4t4 h4h4 t5t5 h5h5 t6t6 h6h6 t7t7 h7h7 t8t8 h8h8 h 12 h 34 h 56 h 78 h 14 h 58 h 18 ^^^^^^^^ ^^^^ ^ ^ ^ Leaf level For t 1 : c i = p i XOR h 1 (h 12 (h 14 (h 18 ( cid ))))

45 Computing the answer of a query SQuery(q) Φ = {h 14 ο h 18 } t1t1 h1h1 t2t2 h2h2 t3t3 h3h3 t4t4 h4h4 t5t5 h5h5 t6t6 h6h6 t7t7 h7h7 t8t8 h8h8 h 12 h 34 h 56 h 78 h 14 h 58 h 18 ^^^^^^^^ ^^^^ ^ ^ ^ Leaf level Answers

46 Share (C S, Φ, k) Φ = {h 14 ο h 18 } H = Share(C S, Φ, k) – H = {h 14 (h 18 (cid)) | cid of c and c in C S }

47 Computing the answer of a query Bob’s knowledge – x = h 14 (h 18 (cid )) t1t1 h1h1 t2t2 h2h2 t3t3 h3h3 t4t4 h4h4 t5t5 h5h5 t6t6 h6h6 t7t7 h7h7 t8t8 h8h8 h 12 h 34 h 56 h 78 h 14 h 58 h 18 ^^^^^^^^ ^^^^ ^ ^ ^ Leaf level value key of t 1 = h 1 ( h 12 (x)) One-way hash, can’t go up Can’t see other tuples Specific to this column, not another column

48 Developed schemes SchemeSecure against user-SP Collusion? Secure in multiple sharing? Cost BasicYesPartialO(m+n) - Very low MultiYes O(m+n) – Low SchemeAlice’s costSecure in multiple sharing? BasicO(m + u)Partial MultiO(mu)Yes u: number of nodes m: number of columns n: number of tuples With Pre-computation

49 Related work Privacy preserving data integration, e.g., DMKD 04 – User issues query that is to be answered by an untrusted platform across multiple data sources – Different model Access control by ABE (attribute –based encryption), e.g., ASIACCS 10 – Each data is associated with an access structure. Each user is associated with certain access attributes. Only the user with the access attributes satisfying the access structure of the data can decrypt the data.

50 Access control Example: A file requires “IT staff” OR (“Marketing” AND “Manager”) Alan is - OK Betty is - OK Cathy is - No Features Attribute revocation and ciphertext revocation: SP takes almost all workload – Attribute revocation: User permission changes, e.g., Betty becomes – Ciphertext revocation: file permission changes Drawback in our case: require a pre-defined set of access attributes Ad hoc sharing instances? – Need to add a new attribute, say “ABC company”, which requires re-encryption of the entire database, by the data owner – Side note: this method is attracting a good amount of attention in crypto area.

51 Backup

52 E i (ϒ 1, k 12 )E i (ϒ 2, k 12 )E s (k 12 ) t3t3 t4t4 t5t5 t6t6 t7t7 t8t8 Leaf level E i (k 12, k 14 )E i (k 34, k 14 )E s (k 14 ) E i (k 14, k 18 )E i (k 58, k 18 )E s (k 18 ) t1t1 t2t2

53 t1t1 h1h1 t2t2 h2h2 t3t3 h3h3 t4t4 h4h4 t5t5 h5h5 t6t6 h6h6 t7t7 h7h7 t8t8 h8h8 h 12 h 34 h 56 h 78 h 14 h 58 h 18 ^^^^^^^^ ^^^^ ^ ^ ^ Leaf level

