Order Preserving Encryption for Numeric Data Rakesh Agrawal Jerry Kiernan Ramakrishnan Srikant Yirong Xu IBM Almaden Research Center.

Name: Order Preserving Encryption for Numeric Data Rakesh Agrawal Jerry Kiernan Ramakrishnan Srikant Yirong Xu IBM Almaden Research Center.
Uploaded: 2017-12-14T20:43:39+00:00
Duration: PTM15S36
Channel: Cindy Wynter
Description: Order Preserving Encryption for Numeric Data Rakesh Agrawal Jerry Kiernan Ramakrishnan Srikant Yirong Xu IBM Almaden Research Center.

Order Preserving Encryption for Numeric Data Rakesh Agrawal Jerry Kiernan Ramakrishnan Srikant Yirong Xu IBM Almaden Research Center

Outline Motivation and Introduction OPES encryption
Modeling the distribution Experimental evaluation

Motivation Encryption is rapidly becoming a requirement in a myriad of business settings (e.g., health care, financial, retail, government), driven by legislations (e.g. SB1386, HIPAA) Encrypting databases unleashes a host of problems: Performance slowdown Incompatibility with standard database features E.g. comparison predicates and the use of indexes Changes to applications for encryption Encryption functions now appear in queries

Order Preserving Encryption Function
E is an order preserving encryption function, and p1 and p2 are two plaintext values, and c1 = E(p1) c2 = E(p2) if (p1 < p2) then (c1 < c2)

Threat Model The storage system used by the DBMS is untrusted, i.e. vulnerable to compromise The DBMS software is trusted Ciphertext only attack The adversary has access to all (but only) encrypted values Guard against percentile exposure An adversary should not be able to get even an estimate of true values

Design Goals Query results from OPES will be sound and complete
Comparison operations will be performed without decrypting the operands Standard database indexes can be used over encrypted data Tolerate updates

Integration of Encryption and Query Processing
Users have a plaintext view of an encrypted database We hereafter strictly focus on the OPES algorithms Comparison operators are directly applied over encrypted columns Queries Plaintext queries are translated into equivalent queries over encrypted data Select name from Emp where sal > Translation layer Select decrypt (“xsxx”) from “cwlxss” where “xescs” > OPESencrypt(100000) DBMS Tables are encrypted using standard as well as order preserving encryption Encrypted data And metadata

Approach Plaintext data has unknown distribution
User selects the target (ciphertext) distribution Ciphertext values exhibit the target distribution

Effect of OPES Encryption on Plaintext Distributions
Original Encrypted Target Input: Gaussian, Target: Zipf Input: Uniform, Target: Zipf

OPES Key Generation Sample of source values from the plaintext distribution Sample of target values from the ciphertext distribution OPES Key Generation OPES Key

OPES Keys Target to uniform Target Source to uniform Uniform Uniform

Two Step Encryption Source (plaintext) to uniform
Uniform to target (ciphertext)

OPES Encryption Step II Step I Target Uniform Uniform Source Step II
Decrypt

Modeling the Distribution
Histograms Equi-depth, equi-width, wavelets Number of buckets required unreasonably large Over fitting the model Parametric Poor estimation for irregular distributions Hybrid [Konig and Weikum 99] Query result size estimation Approach Partition the data into buckets Model the distribution within a bucket as a spline Fixed number of buckets

Our Approach Hybrid [Konig and Weikum 99]
Partition the data into buckets Model the distribution within each bucket as a linear spline The number of buckets is not fixed We use MDL to determine the number of bucket boundaries

MDL The best model for encoding data minimizes the sum of the cost of
Describing the model Describing data in terms of the model

Model Costs Data Cost Incremental Model Cost
Using a mapping M from [pl,ph) to [fl,fh), the cost of encoding pi is C(pi)=log(fi-E(i)) DC(pl,ph) = C(pl)+C(pl+1)+…+C(ph-1) Incremental Model Cost Fixed cost for each additional bucket Boundary value Boundary parameters Slope Scale factor

Computing Boundaries Growth phase Prune phase
[pl,ph) with h-l-1 sorted points {pl+1,pl+2,…,ph-1} Compute spline for [pl,ph) Compute [fl,fh) using the spline Find further split point ps with fs having the maximum deviation from the expected value Prune phase LB(pl,ph)=DC(pl,ph)-DC(pl,ps)-DC(ps,ph)-IMC GB(pl,ph)=LB(pl,ph)+GB(pl,ps)+GB(ps,ph) if (GB > 0), the split at ps is retained

Scaling Number of values in a bucket may be disproportional to the size of the bucket Uniform x x x x x Source x x x x x b b+1 b-1

Updates The scale factor ensures that each distinct plaintext value maps to distinct ciphertext values Encrypted values need not be recomputed unless the distribution of plaintext values changes

Quality of Encryption KS Statistical Test
Can we disprove, to a certain required level of significance, the null hypothesis that two data sets are drawn from the same distribution function? If not, then the ciphertext distribution cannot be distinguished from the specified target distribution

Duplicates Assumptions Alternatively,
A large number of duplicates may leak information about the distribution of values Alternatively, Map duplicates to distinct values if (f = M(p), f’ = M(p+1)) [f,f’) = M(p) Equality expressed as a range Equi-joins can no longer be expressed However, many numeric attributes (e.g., salary) may rarely be used in joins

Experimental Evaluation
Percentile exposure Updatability Key size Time overhead

Datasets Census Gaussian Zipf Uniform
UCI KDD archive, PUMS census data (30,000) records Gaussian Zipf Uniform Default Source: Gaussian Target: Zipf

Percentile Exposure Source distribution Target distribution
Average change in percentile Census Gaussian 37 Zipf 7 Uniform 38 45 17 44

Time to the Build Model

Insertion Overhead

Cost of Additional Insertion

Retrieval Overhead

Retrieval Time

Related Work Polynomial functions Database as a service
Ignores the distribution of plaintext/ciphertext values Database as a service Requires post processing of query results Privacy homomorphisms Comparison operations not investigated Keyword searches on encrypted data Designed for keyword retrieval Range queries not supported Smartcard-based schemes Infeasible for large ranges Order-preserving hashing Protecting the hash values from cryptanalysis is not a concern, nor is deciphering plaintext values from hash values Designed for static collections

Closing Remarks Ensuring safety without impeding the flow of information is a hard problem Current choices Plaintext database Encrypted databases with loss of functionality or performance Our approach focused on the trade-off between security and efficiency We developed an algorithm which could easily be integrated with current systems Protecting data without impeding the flow of information is an extremely hard problem. Today: no encryption, or if you encrypt, you performance goes to hell First stab at the problem focusing on opes to balance trade-off increasing security without affecting efficiency. In the first stab, we wanted something that is easily integrated with systems. Challenge is to have a complete set and techniques for a system for encrypting a database while still preserving the efficiency of operations.

Backup

Encode Encode(p) = z(sp2+p) p c [0,ph), s = q/(2r), z > 0
distribution has density function qp + r p is the source (target) value s is the quadratic coefficient z is the scale factor

Decode z ! z2 + 4zsf Decode (f) = 2zs
f c [0, fh), s = q/(2r), z > 0 f is the flattened value s is the quadratic coefficient z is the scale factor

Order Preserving Encryption
No Name Position Salary Location … Ciphertext is the index value Effectively hides the distribution of plaintext values The key size is proportional to the number of distinct attribute values Any updates require recomputing the key and ciphertext values Ciphertext Plaintext 1 28000 2 35000 … Cn Pn Compute distinct attribute values in ascending order

Target Distribution Requirement
Why isn’t the source-to-uniform transformation sufficient for order preserving encryption? It is, but The target distribution may cause an adversary to make incorrect assumptions about the source distribution The organization of the source distribution cannot be inferred from the target

Quadratic Coefficient
x x x x x x x x x x … v = b1 b2 i1 j1 i2 j2 j2 – i2 j1 – i1 - vj2 – vi2 vj1 – vi1 q q = s = vb1 – vb2 j1 – i1 2 vj1 – vi1

Scale Factor Constraints
for all p c [0,w) : M(p+1) – M(p) o 2 Ensures that there is a distinct mapped value for each input value wf = Kn The width of a bucket in the mapped space is a function of the number of elements n in the bucket K is the minimum width needed across buckets

Scale Factor Kn z = sw2 + w K = max [x(swi2+w)], i = 1, …, m, 2, s o 0
The scale factor will stretch short buckets to the width of the largest bucket, further increasing the dimension of a bucket by a factor of the number of elements in the bucket Kn z = sw2 + w K = max [x(swi2+w)], i = 1, …, m, 2, s o 0 2/(1 + s(2w – 1)), s < 0 x =

Slope The values within a single bucket are unevenly distributed within the bucket b-1 b

Order Preserving Encryption for Numeric Data Rakesh Agrawal Jerry Kiernan Ramakrishnan Srikant Yirong Xu IBM Almaden Research Center.

Similar presentations

Presentation on theme: "Order Preserving Encryption for Numeric Data Rakesh Agrawal Jerry Kiernan Ramakrishnan Srikant Yirong Xu IBM Almaden Research Center."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Order Preserving Encryption for Numeric Data Rakesh Agrawal Jerry Kiernan Ramakrishnan Srikant Yirong Xu IBM Almaden Research Center.

Similar presentations

Presentation on theme: "Order Preserving Encryption for Numeric Data Rakesh Agrawal Jerry Kiernan Ramakrishnan Srikant Yirong Xu IBM Almaden Research Center."— Presentation transcript:

Similar presentations

About project

Feedback