Download presentation

Presentation is loading. Please wait.

Published byCindy Wynter Modified over 2 years ago

1
Order Preserving Encryption for Numeric Data Rakesh Agrawal Jerry Kiernan Ramakrishnan Srikant Yirong Xu IBM Almaden Research Center

2
**Outline Motivation and Introduction OPES encryption**

Modeling the distribution Experimental evaluation

3
Motivation Encryption is rapidly becoming a requirement in a myriad of business settings (e.g., health care, financial, retail, government), driven by legislations (e.g. SB1386, HIPAA) Encrypting databases unleashes a host of problems: Performance slowdown Incompatibility with standard database features E.g. comparison predicates and the use of indexes Changes to applications for encryption Encryption functions now appear in queries

4
**Order Preserving Encryption Function**

E is an order preserving encryption function, and p1 and p2 are two plaintext values, and c1 = E(p1) c2 = E(p2) if (p1 < p2) then (c1 < c2)

5
Threat Model The storage system used by the DBMS is untrusted, i.e. vulnerable to compromise The DBMS software is trusted Ciphertext only attack The adversary has access to all (but only) encrypted values Guard against percentile exposure An adversary should not be able to get even an estimate of true values

6
**Design Goals Query results from OPES will be sound and complete**

Comparison operations will be performed without decrypting the operands Standard database indexes can be used over encrypted data Tolerate updates

7
**Integration of Encryption and Query Processing**

Users have a plaintext view of an encrypted database We hereafter strictly focus on the OPES algorithms Comparison operators are directly applied over encrypted columns Queries Plaintext queries are translated into equivalent queries over encrypted data Select name from Emp where sal > Translation layer Select decrypt (“xsxx”) from “cwlxss” where “xescs” > OPESencrypt(100000) DBMS Tables are encrypted using standard as well as order preserving encryption Encrypted data And metadata

8
**Outline Motivation and Introduction OPES encryption**

Modeling the distribution Experimental evaluation

9
**Approach Plaintext data has unknown distribution**

User selects the target (ciphertext) distribution Ciphertext values exhibit the target distribution

10
**Effect of OPES Encryption on Plaintext Distributions**

Original Encrypted Target Input: Gaussian, Target: Zipf Input: Uniform, Target: Zipf

11
OPES Key Generation Sample of source values from the plaintext distribution Sample of target values from the ciphertext distribution OPES Key Generation OPES Key

12
**OPES Keys Target to uniform Target Source to uniform Uniform Uniform**

13
**Two Step Encryption Source (plaintext) to uniform**

Uniform to target (ciphertext)

14
**OPES Encryption Step II Step I Target Uniform Uniform Source Step II**

Decrypt

15
**Outline Motivation and Introduction OPES encryption**

Modeling the distribution Experimental evaluation

16
**Modeling the Distribution**

Histograms Equi-depth, equi-width, wavelets Number of buckets required unreasonably large Over fitting the model Parametric Poor estimation for irregular distributions Hybrid [Konig and Weikum 99] Query result size estimation Approach Partition the data into buckets Model the distribution within a bucket as a spline Fixed number of buckets

17
**Our Approach Hybrid [Konig and Weikum 99]**

Partition the data into buckets Model the distribution within each bucket as a linear spline The number of buckets is not fixed We use MDL to determine the number of bucket boundaries

18
**MDL The best model for encoding data minimizes the sum of the cost of**

Describing the model Describing data in terms of the model

19
**Model Costs Data Cost Incremental Model Cost**

Using a mapping M from [pl,ph) to [fl,fh), the cost of encoding pi is C(pi)=log(fi-E(i)) DC(pl,ph) = C(pl)+C(pl+1)+…+C(ph-1) Incremental Model Cost Fixed cost for each additional bucket Boundary value Boundary parameters Slope Scale factor

20
**Computing Boundaries Growth phase Prune phase**

[pl,ph) with h-l-1 sorted points {pl+1,pl+2,…,ph-1} Compute spline for [pl,ph) Compute [fl,fh) using the spline Find further split point ps with fs having the maximum deviation from the expected value Prune phase LB(pl,ph)=DC(pl,ph)-DC(pl,ps)-DC(ps,ph)-IMC GB(pl,ph)=LB(pl,ph)+GB(pl,ps)+GB(ps,ph) if (GB > 0), the split at ps is retained

21
Scaling Number of values in a bucket may be disproportional to the size of the bucket Uniform x x x x x Source x x x x x b b+1 b-1

22
Updates The scale factor ensures that each distinct plaintext value maps to distinct ciphertext values Encrypted values need not be recomputed unless the distribution of plaintext values changes

23
**Quality of Encryption KS Statistical Test**

Can we disprove, to a certain required level of significance, the null hypothesis that two data sets are drawn from the same distribution function? If not, then the ciphertext distribution cannot be distinguished from the specified target distribution

24
**Duplicates Assumptions Alternatively,**

A large number of duplicates may leak information about the distribution of values Alternatively, Map duplicates to distinct values if (f = M(p), f’ = M(p+1)) [f,f’) = M(p) Equality expressed as a range Equi-joins can no longer be expressed However, many numeric attributes (e.g., salary) may rarely be used in joins

25
**Outline Motivation and Introduction OPES encryption**

Modeling the distribution Experimental evaluation

26
**Experimental Evaluation**

Percentile exposure Updatability Key size Time overhead

27
**Datasets Census Gaussian Zipf Uniform**

UCI KDD archive, PUMS census data (30,000) records Gaussian Zipf Uniform Default Source: Gaussian Target: Zipf

28
**Percentile Exposure Source distribution Target distribution**

Average change in percentile Census Gaussian 37 Zipf 7 Uniform 38 45 17 44

29
Time to the Build Model

30
Insertion Overhead

31
**Cost of Additional Insertion**

32
Retrieval Overhead

33
Retrieval Time

34
**Related Work Polynomial functions Database as a service**

Ignores the distribution of plaintext/ciphertext values Database as a service Requires post processing of query results Privacy homomorphisms Comparison operations not investigated Keyword searches on encrypted data Designed for keyword retrieval Range queries not supported Smartcard-based schemes Infeasible for large ranges Order-preserving hashing Protecting the hash values from cryptanalysis is not a concern, nor is deciphering plaintext values from hash values Designed for static collections

35
Closing Remarks Ensuring safety without impeding the flow of information is a hard problem Current choices Plaintext database Encrypted databases with loss of functionality or performance Our approach focused on the trade-off between security and efficiency We developed an algorithm which could easily be integrated with current systems Protecting data without impeding the flow of information is an extremely hard problem. Today: no encryption, or if you encrypt, you performance goes to hell First stab at the problem focusing on opes to balance trade-off increasing security without affecting efficiency. In the first stab, we wanted something that is easily integrated with systems. Challenge is to have a complete set and techniques for a system for encrypting a database while still preserving the efficiency of operations.

36
Backup

37
**Encode Encode(p) = z(sp2+p) p c [0,ph), s = q/(2r), z > 0**

distribution has density function qp + r p is the source (target) value s is the quadratic coefficient z is the scale factor

38
**Decode z ! z2 + 4zsf Decode (f) = 2zs**

f c [0, fh), s = q/(2r), z > 0 f is the flattened value s is the quadratic coefficient z is the scale factor

39
**Order Preserving Encryption**

No Name Position Salary Location … Ciphertext is the index value Effectively hides the distribution of plaintext values The key size is proportional to the number of distinct attribute values Any updates require recomputing the key and ciphertext values Ciphertext Plaintext 1 28000 2 35000 … Cn Pn Compute distinct attribute values in ascending order

40
**Target Distribution Requirement**

Why isn’t the source-to-uniform transformation sufficient for order preserving encryption? It is, but The target distribution may cause an adversary to make incorrect assumptions about the source distribution The organization of the source distribution cannot be inferred from the target

41
**Quadratic Coefficient**

x x x x x x x x x x … v = b1 b2 i1 j1 i2 j2 j2 – i2 j1 – i1 - vj2 – vi2 vj1 – vi1 q q = s = vb1 – vb2 j1 – i1 2 vj1 – vi1

42
**Scale Factor Constraints**

for all p c [0,w) : M(p+1) – M(p) o 2 Ensures that there is a distinct mapped value for each input value wf = Kn The width of a bucket in the mapped space is a function of the number of elements n in the bucket K is the minimum width needed across buckets

43
**Scale Factor Kn z = sw2 + w K = max [x(swi2+w)], i = 1, …, m, 2, s o 0**

The scale factor will stretch short buckets to the width of the largest bucket, further increasing the dimension of a bucket by a factor of the number of elements in the bucket Kn z = sw2 + w K = max [x(swi2+w)], i = 1, …, m, 2, s o 0 2/(1 + s(2w – 1)), s < 0 x =

44
Slope The values within a single bucket are unevenly distributed within the bucket b-1 b

Similar presentations

OK

Query Optimization CS634 Lecture 12, Mar 12, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.

Query Optimization CS634 Lecture 12, Mar 12, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Class 7 science ppt on light Ppt on stock exchanges in india Ppt on understanding by design Ppt on campus recruitment training Ppt on mobile communication and bluetooth Ppt on flux cored arc welding Ppt on catenation of carbon Ppt on power diode Ppt on major types of industries Ppt on depth first search algorithm example