Monomi: Practical Analytical Query Processing over Encrypted Data

Slides:



Advertisements
Similar presentations
Greening Backbone Networks Shutting Off Cables in Bundled Links Will Fisher, Martin Suchara, and Jennifer Rexford Princeton University.
Advertisements

1 Building a Fast, Virtualized Data Plane with Programmable Hardware Bilal Anwer Nick Feamster.
Dynamic Programming Introduction Prof. Muhammad Saeed.
Introduction to Relational Database Systems 1 Lecture 4.
Technische Universität Ilmenau CCSW 2013 Sander Wozniak
Querying Encrypted Data using Fully Homomorphic Encryption Murali Mani, UMFlint Talk given at CIDR, Jan 7,
Performance Tuning Compiled from: Oracle Database Administration, Session 13, Performance, Harvard U Oracle Server Tuning Accelerator, David Scott, Intec.
Database Performance Tuning and Query Optimization
Data Markets in the Cloud: An Opportunity for the Database Community Magdalena Balazinska, Bill Howe, and Dan Suciu University of Washington Project supported.
Performance Tuning for Informer PRESENTER: Jason Vorenkamp| | October 11, 2010.
PHP 5 + MySQL 5 A Perfect 10. Adam Trachtenberg PHP 5 + MySQL 5 = A Perfect mysqli extension i is for improved! All new MySQL extension for PHP.
The Mechanical Cryptographer (Tolerant Algebraic Side-Channel Attacks using pseudo-Boolean Solvers) 1.
Phase Reconciliation for Contended In-Memory Transactions Neha Narula, Cody Cutler, Eddie Kohler, Robert Morris MIT CSAIL and Harvard 1.
+ Accelerating Fully Homomorphic Encryption on GPUs Wei Wang, Yin Hu, Lianmu Chen, Xinming Huang, Berk Sunar ECE Dept., Worcester Polytechnic Institute.
Efficient Information Retrieval for Ranked Queries in Cost-Effective Cloud Environments Presenter: Qin Liu a,b Joint work with Chiu C. Tan b, Jie Wu b,
Paper by: Craig Gentry Presented By: Daniel Henneberger.
Orthogonal Security With Cipherbase 1 Microsoft Research 2 UW-Madison 3 ETH-Zurich Arvind Arasu 1 Spyros Blanas 2 Ken Eguro 1 Donald Kossmann 3 Ravi Ramamurthy.
The State of the Art in Distributed Query Processing by Donald Kossmann Presented by Chris Gianfrancesco.
CryptDB: Protecting Confidentiality with Encrypted Query Processing
CryptDB: Confidentiality for Database Applications with Encrypted Query Processing Raluca Ada Popa, Catherine Redfield, Nickolai Zeldovich, and Hari Balakrishnan.
CryptDB: A Practical Encrypted Relational DBMS Raluca Ada Popa, Nickolai Zeldovich, and Hari Balakrishnan MIT CSAIL New England Database Summit 2011.
1 HYRISE – A Main Memory Hybrid Storage Engine By: Martin Grund, Jens Krüger, Hasso Plattner, Alexander Zeier, Philippe Cudre-Mauroux, Samuel Madden, VLDB.
Manish Bhide, Manoj K Agarwal IBM India Research Lab India {abmanish, Amir Bar-Or, Sriram Padmanabhan IBM Software Group, USA
Chapter 3 Database Management
 Relational Cloud: A Database-as-a-Service for the Cloud Carlo Curino, Evan Jones, Raluca Ada Popa, Nirmesh Malaviya, Eugene Wu, Sam Madden, Hari Balakrishnan,
Making Database Applications Perform Using Program Analysis Alvin Cheung Samuel Madden Armando Solar-Lezama MIT Owen Arden Andrew C. Myers Cornell.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
Database Replication Policies for Dynamic Content Applications Gokul Soundararajan, Cristiana Amza, Ashvin Goel University of Toronto EuroSys 2006: Leuven,
Mohammad Ahmadian COP-6087 University of Central Florida.
Database Laboratory TaeHoon Kim. /25 Work Progress(Range Query) 2.
Secure Cloud Database using Multiparty Computation.
Your Data Any Place, Any Time Online Transaction Processing.
1 © 2012 OpenLink Software, All rights reserved. Virtuoso - Column Store, Adaptive Techniques for RDF Orri Erling Program Manager, Virtuoso Openlink Software.
Wai Kit Wong 1, Ben Kao 2, David W. Cheung 2, Rongbin Li 2, Siu Ming Yiu 2 1 Hang Seng Management College, Hong Kong 2 University of Hong Kong.
Wai Kit Wong, Ben Kao, David W. Cheung, Rongbin Li, Siu Ming Yiu.
Identity-Based Secure Distributed Data Storage Schemes.
Secure Cloud Database with Sense of Security. Introduction Cloud computing – IT as a service from third party service provider Security in cloud environment.
Relational-Based Encryption for Efficient Data Sharing on Encrypted Cloud Relational Databases.
Disclosure risk when responding to queries with deterministic guarantees Krish Muralidhar University of Kentucky Rathindra Sarathy Oklahoma State University.
MIS2502: Data Analytics The Information Architecture of an Organization.
Database Design – Lecture 18 Client/Server, Data Warehouse and E-Commerce Database Design.
Protection of outsourced data MARIA ANGEL MARQUEZ ANDRADE.
1 Biometric Databases. 2 Overview Problems associated with Biometric databases Some practical solutions Some existing DBMS.
Secure Query Processing in an Untrusted (Cloud) Environment.
Secure Query Processing in an Untrusted (Cloud) Environment.
FHE Introduction Nigel Smart Avoncrypt 2015.
CryptDB: Protecting Confidentiality with Encrypted Query Processing
Attribute-Based Encryption With Verifiable Outsourced Decryption.
Last Updated : 27 th April 2004 Center of Excellence Data Warehousing Group Teradata Performance Optimization.
Secure Data Outsourcing
Keyword search on encrypted data. Keyword search problem  Linux utility: grep  Information retrieval Basic operation Advanced operations – relevance.
Oracle Query VBA Tool (OQVT)
Fully Homomorphic Encryption (FHE) By: Matthew Eilertson.
End to End Always Encrypted in SQL Server 2016 Steve Jones SQLServerCentral Redgate Software.
Packing Techniques for Homomorphic Encryption Schemes Scott Thompson CSCI-762 4/28/2016.
Practical Private Range Search Revisited
Big Data Analytics over Encrypted Datasets with Seabed
Application Security Lecture 27 Aditya Akella.
Searchable Encryption in Cloud
Execution Planning for Success
DBMask: Fine-Grained access control on encrypted relational databases
Hybrid Cloud Architecture for Software-as-a-Service Provider to Achieve Higher Privacy and Decrease Securiity Concerns about Cloud Computing P. Reinhold.
Fast Searchable Encryption with Tunable Locality
Using cryptography in databases and web applications
Verifiable Oblivious Storage
A Privacy-Preserving Index for Range Queries
برون‌سپاری پایگاه داده
بررسی معماری های امن پایگاه داده از جنبه رمزنگاری
Declarative Transfer Learning from Deep CNNs at Scale
Helen: Maliciously Secure Coopetitive Learning for Linear Models
Presentation transcript:

Monomi: Practical Analytical Query Processing over Encrypted Data Stephen Tu, M. Frans Kaashoek, Samuel Madden, Nickolai Zeldovich MIT CSAIL

Typical deployment Problem: Want to run queries over data! “Give me the # of views of all adults by country” Trusted user Query Response US 1M Italy 3K … Vulnerable database Problem: Want to run queries over data!

Approach 1: Fully Homomorphic Encryption (FHE) Groundbreaking theoretical result [Gentry 09] Run any computation over encrypted data Prohibitive overheads in practice

Approach 2: Specialized Schemes Cryptosystems supporting specific operations: Equality (deterministic) [AES] Addition [Paillier 99] Inequality (order preserving) [Boldyreva 09] Keyword Search [Song 00] These operations common in SQL queries…

Practical state of the art: CryptDB Trusted Under attack Proxy DB Server plain query transformed query Application Encrypted DB decrypted results Stores encryption keys encrypted results Deterministic encryption: Equality Paillier cryptosystem: Addition Order preserving encryption: Inequality SELECT country, SUM(views) FROM users WHERE age > 18 GROUP BY country Original Query: SELECT country_DET, PAILLIER_SUM(views_HOM) FROM users_ENCRYPTED WHERE age_OPE > 0xDEADBEEF GROUP BY country_DET Transformed Query: No client computation: CryptDB requires that all computation in a query are supported by a specialized crypto-system 0xDEADBEEF = Encrypt_OPE(18)

Problem: OLTP ≠ OLAP CryptDB is designed for OLTP queries We are interested in OLAP queries Queries typically involve more computation CryptDB can only support 4/22 TPC-H queries

Problem: OLTP ≠ OLAP Our insight SELECT category, SUM(cost * quantity) AS value FROM product WHERE made_in = ‘United States’ GROUP BY category HAVING SUM(cost * quantity) > 1000000 ORDER BY value SELECT category, SUM(cost * quantity) AS value FROM product WHERE made_in = ‘United States’ GROUP BY category HAVING SUM(cost * quantity) > 1000000 ORDER BY value SELECT category, SUM(cost * quantity) AS value FROM product WHERE made_in = ‘United States’ GROUP BY category HAVING SUM(cost * quantity) > 1000000 ORDER BY value SELECT category, SUM(cost * quantity) AS value FROM product WHERE made_in = ‘United States’ GROUP BY category HAVING SUM(cost * quantity) > 1000000 ORDER BY value Our insight: Most of the query can be executed on the server, except a few parts No efficient additive + order preserving homomorphic cryptosystem No efficient additive + multiplicative homomorphic cryptosystem What happens when we run this query with CryptDB?

Contributions Monomi: A new system for practical analytical query processing Split client/server query execution Pre-computation + other runtime optimizations Query planner/designer Monomi: Can run TPC-H with 1.24x median overhead (vs. plaintext) using these three techniques.

Split client/server execution SELECT category, SUM(cost * quantity) AS value SELECT category, SUM(cost * quantity) AS value GROUP BY category HAVING SUM(cost * quantity) > 1000000 ORDER BY value SELECT category, SUM(cost * quantity) AS value FROM product WHERE made_in = ‘United States’ GROUP BY category HAVING SUM(cost * quantity) > 1000000 ORDER BY value category_DET cost_DET quantity_DET … 0xdd032543 0x34778428 0xaeb7e344 0x7658Ae7e 0xeba13477 SELECT category_DET, cost_DET, quantity_DET, product_ENC Explain what the table is Make it clear that goal is to find a query to run on server over the specific crypto systems GROUP BY category HAVING SUM(cost * quantity) > 1000000 ORDER BY value FROM product_ENC WHERE made_in_DET = Encrypt_DET(‘United States’) Trusted Client Untrusted Server

Pre-computation product_ENC Trusted Client Untrusted Server SELECT category_DET cost_DET quantity_DET cost_qty_HOM … 0xdd032543 0x34778428 0xaeb7e344 0x24bbae88 0x7658Ae7e 0xeba13477 0x8927deaf category_DET cost_DET quantity_DET … 0xdd032543 0x34778428 0xaeb7e344 0x7658Ae7e 0xeba13477 SELECT category_DET, PAL_SUM(cost_qty_HOM), SELECT category_DET, cost_DET, quantity_DET, product_ENC Show on DB server that we store (cost * quantity) GROUP BY category HAVING SUM(cost * quantity) > 1000000 ORDER BY value FROM product_ENC WHERE made_in_DET = Encrypt_DET(‘United States’) FROM product_ENC WHERE made_in_DET = Encrypt_DET(‘United States’) GROUP BY category_DET HAVING SUM(cost * quantity) > 1000000 ORDER BY value Trusted Client Untrusted Server

Split execution in action Split A ClientDecrypt columns: [0] Split B ClientSort key: [1] ClientDecrypt columns: [0] ClientGroupFilter expr: $1 > 1000000 ClientSort key: [1] ClientGroupBy key: [0] Split B pushes to server ClientProjection exprs: [$0, $1*$2] ClientGroupFilter expr: $1 > 1000000 Trusted ClientDecrypt columns: [1,2] ClientDecrypt columns: [1] SELECT category_DET, cost_DET, quantity_DET FROM product_ENC WHERE made_in_DET = 0xDEADBEEF RemoteSQL SELECT category_DET, PAL_SUM(cost_qty_HOM) FROM product_ENC WHERE made_in_DET = 0xDEADBEEF GROUP BY category_DET RemoteSQL Untrusted

Challenge: Splitting queries Strawman: Greedy split Always running computation on server if possible Problem: Can fail to produce the optimal plan

Why greedy split can fail Crypto ops have very different runtimes Paillier addition: .005ms Deterministic (AES) decrypt: .01ms (2x add) Paillier decrypt: .5ms (100x add, 50x AES decrypt)

Why greedy split can fail SELECT SUM(salary) FROM employees GROUP BY dept Two possible plans: A: Server uses Paillier to SUM for each dept B: Server does GROUP BY, returns deterministic ciphertexts for salaries, client decrypts + sums Optimal plan depends on data A better for large groups, B better for small groups Large groups amortize cost of Paillier decryption

Challenge: Splitting queries Solution: Cost-based optimizer (planner) for computing optimal split Side benefit: Can propose what-if scenarios to evaluate gains from allowing a crypto-system Performance vs. security trade-off Split 1 Cost: 803.1 Planner Split 2 Cost: 400.2 Split 3 Cost: 1791.8

Challenge: Physical design Physical design means: Which crypto-systems to materialize? Which pre-computed expressions? Strawman: Materialize everything Space inefficient, hurts performance in row-stores Infinite number of expressions to pre-compute Solution: workload trace + cost-model + integer linear program (ILP)

Putting it all together Space budget Q1 Q2 Q3 Query workload Column DET OPE PAL name age salary Monomi Planner Database Monomi Designer Monomi Runtime Database statistics Encrypted Data Setup Querying

How well does this work?

Evaluation How many TPC-H queries can Monomi run? What is the overhead compared to plaintext? What optimizations matter? Setup: TPC-H scale 10 Postgres 8.4 on Linux 2.6 8GB RAM, 16 cores, six 7200 RPM HDDs

Most TPC-H queries supported Monomi’s approach handles all TPC-H queries Our prototype handles 19/22 due to missing SQL features (e.g. views) First system we know of that can do this! CryptDB only supports 4/22

Overhead vs. plaintext Takeaway: min overhead 1.03x, median overhead 1.24x, max overhead 2.33x

Many techniques important Talk about greedy + precomp See paper for details on other optimizations

Related work Trusted hardware (Cipherbase, TrustedDB): Requires changing hardware (e.g. FPGAs) Different set of assumptions Untrusted server (CryptDB, [Hacıgümüs et al]): Monomi first to show OLAP with low overhead General purpose query planner + designer

Summary Monomi: analytics on encrypted data can be made practical! Techniques: Split client/server execution Pre-computation + other optimizations Planner/designer

Thanks, questions?