Presentation is loading. Please wait.

Presentation is loading. Please wait.

Secure Data Outsourcing

Similar presentations


Presentation on theme: "Secure Data Outsourcing"— Presentation transcript:

1 Secure Data Outsourcing

2 Outline Motivation Background knowledge Research issues Summary
Problem description Review cryptographic primitives Research issues Summary

3 Motivation Cost of maintaining large data
4-5 times of the cost of data acquisition DBAs are paid well  More and more service providers Low cost – cloud computing Maintain one database for one user  multiple users Examples: Alentus.com Datapipe.com Discountasp.net Concerns about data security and privacy Untrusted service provider

4 Un-trusted server Lazy: incentives to perform less
Curious: incentives to acquire information Malicious: Denial of service Incorrect results Possibly compromised

5 Challenges Data confidentiality Access privacy Query assurance
Data need to be encrypted (?) Query on protected data? Mapping Indexing Access privacy SQL query Access pattern – access index/data Query assurance Correct Complete Fresh

6 Why is it hard? Arbitrary expressivity Cost SQL statements
Often, restricted for certain type of query for simplicity (e.g. range query, knn query) Cost Communication Computation (server side vs client side)

7 Data confidentiality Bucketization method (crypto-index)
Order preserving encryption Perturbations

8 Bucketization method Hacigumus (SIGMOD02)

9 Main steps Partition sensitive attributes
Order preserving: supports comparison Random: query rewriting becomes hard Build index on the partitions Rewrite queries to target partitions ‘john doe’  105 Select * from T’ where name=105 Execute queries and return results Prune/post-process results on client

10 Trade off between confidentiality and overhead
Larger partition  increased privacy  increased overheads

11 Order preserving encryption
Agrawal2004, Boldyreva2009 The set of data is securely transformed so that the order is preserved but the distribution and domain are changed Benefits: indexing/searching on OPE encrypted data Weakness: once the original distribution is known, OPE is broken

12 Not attribute-wise order preserving
Order preserving encryption (OPE, Agrawal et al 2004) is not resilient to distribution-based attacks Original Xi distribution is known Transformed Xi’ distribution OPE Bucket based Estimation

13 Perturbation based methods
Multiplicative perturbations RASP perturbation for query services (range query, kNN query) (Xu 2014)

14 confidential query services in the cloud
framework Data D D’ D’ D’=F(D) Data owner q’ Query q q’=Q(q) H(q’,D’) Authorized Users Result R’ Result R R=G(R’) Trusted client Honest but curious cloud RASP framework for confidential query services in the cloud

15 RASP perturbation k-dimensional numeric data, n records, represented as a k x n matrix, x: a record

16 Properties Not an OPE Preserves convexity of the dataset
Convex dataset in Rk  another convex dataset in Rk+2. Good for range query Each range query in Rk  hyperplane based query  range query in Rk+2 .

17 RASP properties Convexity preserving
Queried range (hypercube) is convex RASP transforms the range to another convex (polyhedron) half space: wTx<=a wTx=a The intersection of convex sets is also convex.

18 illustration of convexity preserving
Perturbed space Original space OPE space Xi < a  E(Xi)<E(a)

19 Secure query transformation
A naïve solution Based on the convexity preserving property Problems: (1) A-1 can be probed (2) is If a is known, the whole dimension i is breached.

20 Secure query transformation
Enhanced solution Xk+2 is always positive (Xi-a)  0  (Xi-a)Xk+2  0 Correspondingly, in the encrypted space yTy  0, Problems addressed: (1) A-1 cannot be derived from  (2) (Xi-a)Xk+2  0 contains the random component Xk+2 that protects the condition (Xi-a)  0

21 Efficient two-stage query processing
illustrated Stage2: Filter out the junk records Stage1: Querying this bounding box Original space Transformed space A multidimensional tree index is been built on the encrypted data (in the transformed space) in the server.

22 The client calculates the large bounding box;
Stage 1: The client calculates the large bounding box; The server uses the index to find the results. Stage 2: filter the initial results with the conditions yTiy  0 for 1…2m Note: the two-stage strategy works, if the output of stage 1 is significantly smaller than the original database and can be fit into the memory. Otherwise, use linear scan with stage 2 filtering.

23 Access pattern privacy
On database queries Problem is the same as PIR Attackers may use the access pattern to breach data confidentiality Each of previous approaches should handle this problem!

24 PIR is impractical Solutions based on private Information retrieval (PIR) PIR is still impractical

25 For Bucktization approach
Based on the architecture of Hacigumus (SIGMOD02) Hore VLDB04 (paper 138) For range query Privacy concern: reveal the distribution of value in each bucket “Diffusion”: split buckets and combine parts of different buckets Trade off: now the server needs to return more noisy results  larger size

26 For OPE Queries are protected (assume the original distribution is unknown) Access pattern is not protected may give some information to break the mapping (e.g., estimate the original distribution), no study yet.

27 For RASP Queries are protected
But privacy of access pattern is not preserved

28 Integrity checking Common methods Checksum Hash functions hash trees
Hash chains

29 Integrity guarantee Merkle hash tree
H(H(x1)+H(x2)) , + is string concatenation Can be stored with tree like structure : index, xml

30 Hash chains

31 Applications Query correctness with merkle trees

32 Using merkle tree Example: 5<=q<=10 LUB(q) = 4 GLB(q) = 11

33 Operations: Issues Related work
Selections, projections, equijoins, set ops Issues Works only on data with verification objects Query expressiveness Expensive Related work Pang et. al (ICDE04, SIGMOD05), using ElGamal function Sion VLDB05: challenge token F.Li SIGMOD06: freshness

34 Trusted hardware

35 Possible benefits

36 Discussion Data confidentiality/access pattern
Restrict cryptographic definition (keyword search) or Relaxed definition (perturbation, bucketization, OPE, etc.) It is very difficult to formulate and prove the security of non-traditional approaches Do we need to reformulate the security model? and how?


Download ppt "Secure Data Outsourcing"

Similar presentations


Ads by Google