Secure and Privacy-Preserving Database Services in the Cloud Divy Agrawal, Amr El Abbadi, Shiyuan Wang University of California, Santa Barbara {agrawal,

Slides:



Advertisements
Similar presentations
1 Senn, Information Technology, 3 rd Edition © 2004 Pearson Prentice Hall James A. Senns Information Technology, 3 rd Edition Chapter 7 Enterprise Databases.
Advertisements

Wei Lu 1, Kate Keahey 2, Tim Freeman 2, Frank Siebenlist 2 1 Indiana University, 2 Argonne National Lab
1 ABCs of PKI TAG Presentation 18 th May 2004 Paul Butler.
1 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Distributed databases 3.
Secure Virtual Machine Execution Under an Untrusted Management OS Chunxiao Li Anand Raghunathan Niraj K. Jha.
Database Systems: Design, Implementation, and Management
Querying Encrypted Data using Fully Homomorphic Encryption Murali Mani, UMFlint Talk given at CIDR, Jan 7,
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Trust Management of Services in Cloud Environments:
Making Time-stepped Applications Tick in the Cloud Tao Zou, Guozhang Wang, Marcos Vaz Salles*, David Bindel, Alan Demers, Johannes Gehrke, Walker White.
I have a DREAM! (DiffeRentially privatE smArt Metering) Gergely Acs and Claude Castelluccia {gergely.acs, INRIA 2011.
A Privacy Preserving Index for Range Queries
Efficient Information Retrieval for Ranked Queries in Cost-Effective Cloud Environments Presenter: Qin Liu a,b Joint work with Chiu C. Tan b, Jie Wu b,
SplitX: High-Performance Private Analytics Ruichuan Chen (Bell Labs / Alcatel-Lucent) Istemi Ekin Akkus (MPI-SWS) Paul Francis (MPI-SWS)
CryptDB: A Practical Encrypted Relational DBMS Raluca Ada Popa, Nickolai Zeldovich, and Hari Balakrishnan MIT CSAIL New England Database Summit 2011.
 Guarantee that EK is safe  Yes because it is stored in and used by hw only  No because it can be obtained if someone has physical access but this can.
CSCE 715 Ankur Jain 11/16/2010. Introduction Design Goals Framework SDT Protocol Achievements of Goals Overhead of SDT Conclusion.
An architecture for Privacy Preserving Mining of Client Information Jaideep Vaidya Purdue University This is joint work with Murat.
HENGHA: DATA HARVESTING DETECTION ON HIDDEN DATABASES Shiyuan Wang, Divyakant Agrawal, Amr El Abbadi University of California, Santa Barbara CCSW 2010.
Privacy and Integrity Preserving in Distributed Systems Presented for Ph.D. Qualifying Examination Fei Chen Michigan State University August 25 th, 2009.
Privacy-Preserving Computation and Verification of Aggregate Queries on Outsourced Databases Brian Thompson 1, Stuart Haber 2, William G. Horne 2, Tomas.
 Relational Cloud: A Database-as-a-Service for the Cloud Carlo Curino, Evan Jones, Raluca Ada Popa, Nirmesh Malaviya, Eugene Wu, Sam Madden, Hari Balakrishnan,
Security in Databases. 2 Outline review of databases reliability & integrity protection of sensitive data protection against inference multi-level security.
Cloud Usability Framework
D ATABASE S ECURITY Proposed by Abdulrahman Aldekhelallah University of Scranton – CS521 Spring2015.
Database Laboratory Regular Seminar TaeHoon Kim.
Cloud Computing Saneel Bidaye uni-slb2181. What is Cloud Computing? Cloud Computing refers to both the applications delivered as services over the Internet.
Privacy Preserving Query Processing in Cloud Computing Wen Jie
Overview of Privacy Preserving Techniques.  This is a high-level summary of the state-of-the-art privacy preserving techniques and research areas  Focus.
Switch off your Mobiles Phones or Change Profile to Silent Mode.
Secure Cloud Database using Multiparty Computation.
HPCC 2015, August , New York, USA Wei Chang c Joint work with Qin Liu a, Guojun Wang b, and Jie Wu c a. Hunan University, P. R. China b. Central.
Computer Science Open Research Questions Adversary models –Define/Formalize adversary models Need to incorporate characteristics of new technologies and.
Shiyuan Wang, Divyakant Agrawal, Amr El Abbadi Department of Computer Science UC Santa Barbara DBSec 2010.
SEC835 Practical aspects of security implementation Part 1.
Wai Kit Wong 1, Ben Kao 2, David W. Cheung 2, Rongbin Li 2, Siu Ming Yiu 2 1 Hang Seng Management College, Hong Kong 2 University of Hong Kong.
Wai Kit Wong, Ben Kao, David W. Cheung, Rongbin Li, Siu Ming Yiu.
Identity-Based Secure Distributed Data Storage Schemes.
Secure Cloud Database with Sense of Security. Introduction Cloud computing – IT as a service from third party service provider Security in cloud environment.
Data Confidentiality on Clouds Sharad Mehrotra University of California, Irvine.
Frontiers in Massive Data Analysis Chapter 3.  Difficult to include data from multiple sources  Each organization develops a unique way of representing.
Secure Data Outsourcing. Outline  Motivation  Background  Research issues  Summary.
Doc.: IEEE /495r1 Submission July 2001 Jon Edney, NokiaSlide 1 Ad-Hoc Group Requirements Report Group met twice - total 5 hours Group size ranged.
Chapter No 4 Query optimization and Data Integrity & Security.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
Protection of outsourced data MARIA ANGEL MARQUEZ ANDRADE.
Data and Applications Security Developments and Directions Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #22 Secure Web Information.
Privacy Preserving Payments in Credit Networks By: Moreno-Sanchez et al from Saarland University Presented By: Cody Watson Some Slides Borrowed From NDSS’15.
A Hybrid Technique for Private Location-Based Queries with Database Protection Gabriel Ghinita 1 Panos Kalnis 2 Murat Kantarcioglu 3 Elisa Bertino 1 1.
Academic Year 2014 Spring Academic Year 2014 Spring.
Differential Privacy Xintao Wu Oct 31, Sanitization approaches Input perturbation –Add noise to data –Generalize data Summary statistics –Means,
Private Information Retrieval Based on the talk by Yuval Ishai, Eyal Kushilevitz, Tal Malkin.
2011 IEEE TrustCom-11 Sushmita Ruj Amiya Nayak and Ivan Stojmenovic Regular Seminar Tae Hoon Kim.
Secure Data Outsourcing
Privacy-Preserving Data Aggregation without Secure Channel: Multivariate Polynomial Evaluation Taeho Jung 1, XuFei Mao 2, Xiang-Yang Li 1, Shao-Jie Tang.
Big Data Security Issues in Cloud Management. BDWG Big Data Working Group Researchers 1: Data analytics for security 2: Privacy preserving 3: Big data-scale.
CPT-S Advanced Databases 1 Yinghui Wu EME 49 ADB (ln29)
CMSC 818J: Privacy enhancing technologies Lecture 2.
Practical Private Range Search Revisited
Data Security and Privacy Keke Chen
Application Security Lecture 27 Aditya Akella.
Searchable Encryption in Cloud
OblivP2P: An Oblivious Peer-to-Peer Content Sharing System
Hybrid Cloud Architecture for Software-as-a-Service Provider to Achieve Higher Privacy and Decrease Securiity Concerns about Cloud Computing P. Reinhold.
OblivP2P: An Oblivious Peer-to-Peer Content Sharing System
Fast Searchable Encryption with Tunable Locality
Using cryptography in databases and web applications
Cloud Testing Shilpi Chugh.
Cloud Security 李芮,蒋希坤,崔男 2018年4月.
Presentation transcript:

Secure and Privacy-Preserving Database Services in the Cloud Divy Agrawal, Amr El Abbadi, Shiyuan Wang University of California, Santa Barbara {agrawal, amr, ICDE2013 Tutorial

Cloud Computing Successful paradigm for computing and storage Features – Pay per use – No up-front cost for deployment – Scalability – Elasticity Software as a Service (SaaS) Platform as a Service (PaaS) Infrastructure as a Service (IaaS) 4/11/2013ICDE 2013 Tutorial2

Adopting the Cloud s Collaboration Administrative apps Conferencing software Education 4/11/2013ICDE 2013 Tutorial3 Early adopters are mainly low risk apps with less sensitive data Sensitive Data

Cloud – A Tempting Attack Target Why the cloud? – Ubiquitous access to consolidated data. – Shared infrastructure economies of scale – A lot of small and medium businesses Why attack? – Target one service provider, attack multiple companies – Financial gain from trading sensitive information 4/11/2013ICDE 2013 Tutorial4

Cloud Provides Novel Attack Opportunities Co-residence attack [Ristenpart et al. CCS09] – Adversary: non-provider-affiliated malicious parties – Map and identify location of target VM – Place attacker VM co-resident with target VM – Cross-VM side-channel attacks (due to sharing of physical resources): eg, number of visitors to a page, or keystroke attacks for password retrieval. Signature wrapping attack [Somorovsky et al. CCSW11] – Control Interface compromise by capturing a SOAP msg. – Manipulate SOAP message with arbitrary XML fragments – Use XML signature vulnerability to pass authentication – Take control of a victims account 4/11/2013ICDE 2013 Tutorial5

Amazons Best Practices for Cloud Security and Privacy Concerns – Co-residence attacks – Side channel attacks – Network based attacks – Unauthorized accesses – Insider attacks – Privacy violation – Future vulnerabilities? 4/11/2013ICDE 2013 Tutorial6 Defenses [AWS security] – dedicated instances, virtual private cloud, isolated network and traffic – Firewall and access control – Identity and access management, multi-factor authentication – accesses checked and audited – Rely on clients for access control – Recommend using data encryption and encrypted file system Best effort defense is not sufficient

A Barrier to Conquer Security and privacy – a barrier to cloud adoption Data (sensitive data) – a key concern We need to solve data security and privacy problems in the cloud 4/11/2013ICDE 2013 Tutorial7

Outline Database Security and Privacy: General Practice in the DB Community Data Security and Privacy in the Cloud Data Confidentiality Access Privacy Open Research Challenges 4/11/2013ICDE 2013 Tutorial8

Access Control [Bertino et al. TDSC05] Problem Statement: authorizing data access scopes (relations, attributes, tuples) to users of DBMS Discretionary access control – Authorization administration policies, ie, granting and revoking authorization (centralized, ownership, etc) – Content-based using views and rewriting for fine-grained access control – Role-based access control: a function with a set of actions, consisting of users members Mandatory access control: – Object and subject classification (eg, top secret, secret, unclassified, etc). 4/11/2013ICDE 2013 Tutorial9

Data Anonymization Problem: protecting Personally Identifiable Information (PII) and their sensitive attributes 4/11/2013ICDE 2013 Tutorial10 Quasi-identifierSensitive DOBGenderZipcodeDisease 1/21/76Male53715Heart Disease 4/13/86Female53715Hepatitis 2/28/76Male53703Brochitis 1/21/76Male53703Broken Arm 4/13/86Female53706Flu 2/28/76Female53706Hang Nail Quasi-identifiers need to be generalized or suppressed Quasi-identifiers are sets of attributes that can be linked with external data to uniquely identify an individual

Equivalence class share same QI Solution: k-Anonymity [Samarati et al. TR98] Quasi-identifiers indistinguishable among k individuals Implemented by building generalization hierarchy or partitioning multi-dimensional data space 4/11/2013ICDE 2013 Tutorial11

Enhanced Solution: l-Diversity [Machanavajjhala et al. ICDE06] At least l values for sensitive attributes in each equivalence class 4/11/2013ICDE 2013 Tutorial12 ZipcodeAgeSalaryDisease 476**2*20KGastric Ulcer 476**2*25KGastritis 476**2*30KStomach Cancer 4790*4050KGastritis 4790*40100KFlu 4790*4070KBronchitis 476**3*60KBronchitis 476**3*80KPneumonia 476**3*90KStomach Cancer A 3-diverse patient table

Enhanced Solution: t-Closeness [Li et al. ICDE07] Distance between overall distribution of sensitive attribute values and distribution of sensitive attribute values in an equivalence class bounded by t 4/11/2013ICDE 2013 Tutorial13

Privacy-Preserving Data Mining Problems: hide sensitive rules or private individual data in data mining [Verykios et al. SIGMOD04] – 1. sanitize sensitive item sets or sensitive rules – 2. build data mining model without access to precise data, e.g. privacy-preserving classification, clustering – 3. private parties compute together on their private inputs, e.g. distributed association rule mining, collaborative filtering Solutions – 1. Data perturbation, blocking rule confusion – 2. Data perturbation Distribution reconstruction [Agrawal et al. SIGMOD00, PODS01] – 3. Secure Multi-party Computation (SMC) [Clifton et al. KDD02] 4/11/2013ICDE 2013 Tutorial14

Differential Privacy for Statistical Data [Dwork ICALP06] 4/11/2013ICDE 2013 Tutorial15

A randomized function K gives ε-Differential Privacy IFF for all datasets D 1 and D 2 differing on at most one element, and all S Range (K) Strong privacy guarantees while querying a database 16 Query A PERTURBATION P(A) Query A PERTURBATION P(A) Indistiguishable! Thanks to Ben Zhao for this slide Differential Privacy for Statistical Data [Dwork ICALP06]

Access Control & Privacy [Chaudhuri et al. CIDR11] 4/11/2013ICDE 2013 Tutorial17 Hybrid System combining authorization predicates and noisy views

Secure Devices for Privacy [Anciaux et al. SIGMOD07] Problem: protecting private data during queries involving both private (hidden) and public (visible) data Solution: carry private data in a secure USB key, ensure private data never leaves the USB key, and only public data flows to the key Query optimization for small RAM USB key 4/11/2013ICDE 2013 Tutorial18

Outline Database Security and Privacy Data Security and Privacy in the Cloud Data Confidentiality Access Privacy Open Research Challenges 4/11/2013ICDE 2013 Tutorial19

A LOT OF PROBLEMS NEED TO BE TAKEN CARE OF SOME PROBLEMS ARE OLD SOME PROBLEMS ARE AMPLIFIED BY THE CLOUD 4/11/2013ICDE 2013 Tutorial20

Problems Amplified by the Cloud 4/11/2013ICDE 2013 Tutorial21 Data confidentiality – Attacks Unauthorized accesses, side channel attacks – Solutions Encryption, querying encrypted data Trusted computing User Cloud Servers Data Query Answer Access privacy – Attacks Inferences on access patterns or query results – Solutions Private information retrieval Query obfuscation

Data Services in the Cloud 4/11/2013ICDE 2013 Tutorial22 DB Queries Functionality Performance Adversaries: curious but not malicious cloud / insiders 3 rd party attackers Actions: obtain / infer data and queries

Challenges: Conflicting Goals 4/11/2013ICDE 2013 Tutorial23 Existing Services Functionality Performance Confidentiality / Privacy High Low High Many Crypto Systems/Protocols Ideal State

Outline Database Security and Privacy Data Security and Privacy in the Cloud Data Confidentiality Access Privacy Open Research Challenges 4/11/2013ICDE 2013 Tutorial24

Data Confidentiality 1. Encryption – Homomorphic encryption – Partition Index – Order-preserving encryption – Encrypted Index 2. Leveraging Trust – Distribution – Trusted computing 4/11/2013ICDE 2013 Tutorial25

Database as a Service [Hacigümüs et al. ICDE02] Protects data from steeling but plaintext data can still be seen on the server Write – encrypt before storing – insert into lineitem (discount) values (encrypt(10,key)) Read – decrypt before access – select decrypt(discount,key) from lineitem where custid = 300 Encryption alternatives – Software level v.s. Hardware level (cryptographic coprocessor) encryption – Granularity: field, row, page 4/11/2013ICDE 2013 Tutorial26

Keyword Search on Encrypted Texts [Song et al. S&P00] Directly search on encrypted data without decryption on server side Encrypt word by word. For word W i – Block_ciphertext X i = E k (W i ), Word key k i = f k (X i ), Pseudorandom sequence T i = – Searchable_ciphertext C i = X i T i Search for a word W – Block_ciphertext X = E k (W), Word key k i = f k (X) – Check ciphertexts one by one to see if C X = (X i T i ) X is of the form for some random value s 4/11/2013ICDE 2013 Tutorial27

Homomorphic Encryption 4/11/2013ICDE 2013 Tutorial28

Homomorphic Encryption 4/11/2013ICDE 2013 Tutorial29 OperationX86-64 Intel Core 2.1 GHz SH_Keygen250 ms SH_Enc24 ms SH_Add1 ms SH_Mul41 ms SH_Dec (2-element ciphertext)15 ms SH_Dec (3-element ciphertext)26 ms From Kristen Lauters MSR Faculty Summit million data Aggregation: 16 minutes Range query: 11 hours Too expensive to be practical

WE NEED PRACTICAL SOLUTIONS TO QUERYING ON ENCRYPTED DATABASE 4/11/2013ICDE 2013 Tutorial30

Partition and Identification Index [Hacigümüs et al. SIGMOD02] E(tuple): encrypted-tuple, {attribute-index} Attribute-index: attribute value partition ids 4/11/2013ICDE 2013 Tutorial

Partition and Identification Index Client knows a map function, Map(val) = id of the partition containing val 4/11/2013ICDE 2013 Tutorial Random mapping Order-preserving mapping

Mapping Predicate Conditions Map(< val) : ids of the partitions that could contain values < val E.g. Map(eid < 280) = {2, 7} for random mapping Map(> val) : ids of the partitions that could contain values > val Map(A i = A j ): pairs of ids of the partitions that could have equal A i and A j values Decryption and processing on the client 4/11/2013ICDE 2013 Tutorial33

Mapping Predicate Conditions 4/11/2013ICDE 2013 Tutorial34 emp.did = mrg.did

Optimal Partition for Range Queries [Hore et al. VLDB04] Optimal for privacy-performance tradeoff Performance: minimize number of false positives over all range queries in a given query distribution – False positives caused by server returning a superset of answers Privacy: maximize variance, entropy of value distribution in a partition – High variance – increase adversaries error in inferring sensitive attribute values – High entropy – reduce adversaries ability to identify encrypted tuples satisfying a plaintext query 4/11/2013ICDE 2013 Tutorial35

Partition / Bucketization Review Pros – Efficient computation on the server Cons – Data update is hard (may need re-distribution) – Filtering super answer set could be time consuming depending on the partitions sizes – Might reveal value distribution from relative partitions changes during dynamic data updates 4/11/2013ICDE 2013 Tutorial36

Can Ciphertext Be Queried Directly Encryption with special properties that allow predicate evaluation on ciphertexts Order-preserving partition mapping order- preserving encryption 4/11/2013ICDE 2013 Tutorial37

Order Preserving Encryption [Agrawal et al. SIGMOD04] 4/11/2013ICDE 2013 Tutorial38

Achieving Order Preserving Encryption 4/11/2013ICDE 2013 Tutorial39

Order-Preserving Review Pros – Return exact answers instead of super sets – Can leverage existing DB index Cons – Hard to perform analysis and aggregation – Some tuples could be easily identified if approach is applied to multiple attributes 4/11/2013ICDE 2013 Tutorial40

CryptDB [Popa et al. SOSP11] Supports a wide range of SQL queries over encrypted data Server fully evaluates queries on encrypted data, and client does not perform query processing SQL-aware encryption – leverage provable practical techniques for different SQL operators over encrypted data Adjustable query-based encryption – Dynamically adjust the encryption level of data items according to users queries Onion of encryptions – From weaker forms of encryption that allow certain computation to stronger forms of encryption that reveal no information 4/11/2013ICDE 2013 Tutorial41

SQL-Aware Onion Encryption 4/11/2013ICDE 2013 Tutorial42 RND: no functionality DET: equality selection SEARCH: word selection (only for text fields) Any value JOIN: equality join RND: no functionality OPE: comparison Any value OPE-JOIN: inequality join int value HOM: sum

CryptDB System 4/11/2013ICDE 2013 Tutorial43 For performing cryptographic operations For sending certain onion layer key

CryptDB Review Pros – Support a wide range of SQL queries Cons – Confidentiality level degrades to the weakest encryption in the long term 4/11/2013ICDE 2013 Tutorial44

WHY CAN WE NOT LEVERAGE WELL PROVED ENCRYPTION MECHANISMS AND DB INDEXING TECHNIQUES 4/11/2013ICDE 2013 Tutorial45

Encrypted Index for Outsourced Data Build a normal B+-tree index on key values Encrypt B+-tree nodes Store (and disperse) encrypted index in the cloud [Damiani et al. CCS03, Wang et al. SDM11] A query with predicates on keys is processed by locating desired key values on encrypted index. Traversal on index relies on the client to retrieve and decrypt index nodes. 4/11/2013ICDE 2013 Tutorial46

4/11/2013ICDE 2013 Tutorial47 A2A1 D: Data Tuples t1t2....,tNt1t2....,tN …………………………………… A1 …………………………………… Ad …………………………………… … …………………………………… A2 I: B+-tree Index … … … … … … …… n1n1 n2n2 … … … ID n1n1 n2n2 … … … IE E(n 2 )E(n 1 ) … … … TD tc 1 tc 2 … … … TE E(tc 2 )E(tc 1 ) SiSi S 1 S n Cloud Servers Salted IDA

Practical Secure Query Processing 4/11/2013ICDE 2013 Tutorial48 Client Proxy SiSi S 1 S n Cloud Servers … Index I … … root … … … IE E(n 2 )E(n 1 ) … … … TE E(tc 2 )E(tc 1 ) IE col1 … … n1n1 1 2 ……………… IE :1 E(n 1 ) ……………… IE :1 E(n 1 ) TE col2 ……………… TE :2 E(tc 2 ) ……………… TE :2 E(tc 2 ) Cache partial index nodes on client to improve efficiency

Encrypted Index Review Pros – Can be directly deployed on existing cloud settings – Provide stronger confidentiality than partition, order- preserving encryption without losing query efficiency Cons – The Clouds computational ability is under utilized – Queries directly supported are limited to queries on indexed key attributes 4/11/2013ICDE 2013 Tutorial49

Data Confidentiality 1. Encryption – Homomorphic encryption – Partition Index – Order-preserving encryption – Encrypted Index 2. Leveraging Trust 4/11/2013ICDE 2013 Tutorial50

Distribution instead of Encryption Under non-communicating servers assumption [Aggarwal et al. CIDR05] 4/11/2013ICDE 2013 Tutorial51 Server 1Server 2 Sensitive attributes E(telephone), E( ) Sensitive association name, salary name salary name, E(salary) Query Q1 Q2 Result(Q1) join Result(Q2)

Distribution Review Pros – Reduce encryption and decryption overhead Cons – Non-communicating servers assumption is strong* – Data distribution policy is usually not up to a client, but decided by cloud server providers – * [Emekci et al. ICDE06, Agrawal et al. SRDS88, Ciriani et al. ESORICS09] 4/11/2013ICDE 2013 Tutorial52

Tamper Resistant Trusted Hardware 4/11/2013ICDE 2013 Tutorial53

Computation Cost Consideration 4/11/2013ICDE 2013 Tutorial54

Trusteddb [Bajaj et al. SIGMOD11] 4/11/2013ICDE 2013 Tutorial55

Trusted Computing Review Pros – Support almost all existing DBMS functionalities Cons – Computing and memory resources are limited Cipherbase [Arasu et al. CIDR13]: better optimization based on trusted hardware – Requires secret key handover from user to trusted hardware 4/11/2013ICDE 2013 Tutorial56

Outline Database Security and Privacy Data Security and Privacy in the Cloud Data Confidentiality Access Privacy Open Research Challenges 4/11/2013ICDE 2013 Tutorial57

Access Privacy 1. Private Information Retrieval (PIR) 2. Oblivious RAM 3. Relaxing Privacy 4/11/2013ICDE 2013 Tutorial58

Private Information Retrieval [Chor et al. JACM98] Multi-servers information theoretic PIR – Implemented based on XOR, polynomial interpolation – Achieves 2-server communication complexity O(n 1/3 ) – Tolerate collusions of up to t < k servers Single-server PIR – Require only computational indistinguishability 4/11/2013ICDE 2013 Tutorial59 X= X1X1 X2X2 ……………… XnXn database Server Client q=give me ith record encrypted(q) encrypted-result=f(X, encrypted(q))XiXi

cPIR Theoretical Background Quadratic Residue (QR) x is a quadratic residue (QR) mod N if – E.g. N=35, 11 is QR (9 2 =11 mod 35) – 3 is QNR (no y exists such that y 2 =3 mod 35) – Essential properties: QR ×QR = QR QR ×QNR = QNR Let N =p 1 ×p 2, p 1 and p 2 are large primes of m/2 bits. Quadratic Residuosity Assumption (QRA) – Determining if a number is a QR or a QNR is computationally hard if p 1 and p 2 are not given. 4/11/2013ICDE 2013 Tutorial60

Single Database cPIR [Kushilevitz et al. FOCS97] 4/11/2013ICDE 2013 Tutorial61 Adapted from Tans presentation e g Get M 2,3 N=35 QNR={3,12,13,17,27,33} QR={1,4,9,11,16,29} QNR z4z3z2z1z4z3z2z1 z 2 =QNR => X 10 =1 z 2 =QR => X 10 =0 M 2, Computation cost: O(n) Client Server z:

4/11/2013ICDE 2013 Tutorial62 Practicality of PIR [Sion et al. NDSS07, Olumofin et al. FC11] cPIR is more than one order of magnitude slower than trivial data transfer. Multi-server PIR is more practical, but it requires servers cannot collude.

PIR Could Be More Practical [Olumofin et al. FC11] Multi-server information-theoretic PIR Single-server lattice-based PIR – Unlike previous cPIR which are based on number theory – Can achieve one order of magnitude speedup by using GPU – Cons: security not well understood as number theory based cPIR 4/11/2013ICDE 2013 Tutorial63

Access Privacy 1. Private Information Retrieval (PIR) 2. Oblivious RAM 3. Relaxing Privacy 4/11/2013ICDE 2013 Tutorial64

Oblivious RAM Based PIR [Goldreich & Ostrovsky JACM 96 Williams et al. NDSS08] A step towards making PIR practical Oblivious RAM : achieve oblivious access in server memory Organize data in pyramid like levels of buckets Ensure each access touches a bucket at every leve l 4/11/2013ICDE 2013 Tutorial65

4/11/2013ICDE 2013 Tutorial66

Oblivious RAM Based PIR 4/11/2013ICDE 2013 Tutorial67 Computation cost: O(log 2 n) Needs some client storage during oblivious re-ordering of encrypted data

Oblivious RAM Review 4/11/2013ICDE 2013 Tutorial68

Access Privacy 1. Private Information Retrieval (PIR) 2. Oblivious RAM 3. Relaxing Privacy 4/11/2013ICDE 2013 Tutorial69

Bounding-Box PIR [Wang et al. DBSEC10] 4/11/2013ICDE 2013 Tutorial e g Get M 2,3 N=35 QNR={3,12,13,17,27,33} QR={1,4,9,11,16,29} z 2 =QNR => M 2,3 =1 M 2, QNR y:y: z:z: Bounding Box Client Server

Hybrid Approach with Homomorphic Encryption [Wang et al. DAPD13] 4/11/2013ICDE 2013 Tutorial71 Client Server 0.1. bucket summary S 0.2. public key K pub 1. query vector Q 2. answer vector V 3. decrypt V & filter R pub.B: [0,100) S1:S1: BK 1 BK 2 BK 3 BK 4 BK 5 BK 6 BK 7 S1S1 K pub Q: [45, 65) Q: (E(0), E(1), E(1), E(1), E(0), E(0), E(0)) V: (E(0) VBK1, E(1) VBK2, E(1) VBK3, E(1) VBK4, E(0) VBK5, E(0) VBK6, E(0) VBK7 ) D(V[2]) = D(E(1) VBK2 ) = D(E(1))*VBK 2 = VBK 2 D(V[3]) = D(E(1) VBK3 ) = D(E(1))*VBK 3 = VBK 3 D(V[4]) = D(E(1) VBK4 ) = D(E(1))*VBK 4 = VBK 4

Hybrid Approach with Homomorphic Encryption [Wang et al. DAPD13] 4/11/2013ICDE 2013 Tutorial72 Client selects subset of buckets for server to work on – Private query buckets – Relevant frequently co- accessed sets of buckets of other users Reasons for using frequent bucket sets – Hide in crowd – Less identifiable Server BHE Client query history Client query history private distributed frequent pattern mining [TKDE04] FBS Client Server HHE Q: (0, E(1), E(1), E(1), 0, E(0), E(0))

Access Pattern Privacy on Encrypted Index [Vimercati et al. ICDCS11] Not using any cryptographic protocols Cover searches – Fake searches Cached searches – Cache index nodes Index shuffling – Exchange contents between index nodes – Counteract node-data association attacks 4/11/2013ICDE 2013 Tutorial73

Index Shuffling 4/11/2013ICDE 2013 Tutorial74

Relaxing Privacy Review Pros – More computationally efficient than PIR Cons – (Incomplete) privacy tricky to define and quantify 4/11/2013ICDE 2013 Tutorial75

Outline Database Security and Privacy Data Security and Privacy in the Cloud Data Confidentiality Access Privacy Open Research Challenges 4/11/2013ICDE 2013 Tutorial76

Open Research Problems Homomorphic encryption for processing range/join database queries on encrypted data Improve performance of querying encrypted data for use in practical OLTP applications – Pre-computation – Parallel calculation End to end security in the cloud – Need information flow control and auditing in addition to cryptography or trusted computing based approaches 4/11/2013ICDE 2013 Tutorial77

Concluding Remarks Cloud security and privacy is not a completely new problem. Some issues are amplified by the cloud. Protecting data confidentiality and access privacy Maintaining practical functionality and performance while achieving security and privacy 4/11/2013ICDE 2013 Tutorial78

References [Bertino et al. TDSC05] E. Bertino et al. Database security-concepts, approaches, and challenges. In IEEE TDSC, 2(1), [Samarati et al. TR98] P. Samarati et al. Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. TR [Machanavajjhala et al. ICDE06] A. Machanavajjhala et al. l-diversity: privacy beyond k-anonymity. In ICDE [Li et al. ICDE07] N. Li et al. t-closeness: privacy beyond k-anonymity and l- diversity. In ICDE [Dwork ICALP06] C. Dwork. Differential privacy. In ICALP(2) [Verykios et al. SIGMOD04] V. S. Verykios et al. State-of-the-art in privacy preserving data mining. In SIGMOD [Agrawal et al. SIGMOD00] R. Agrawal et al. Privacy-preserving data mining. In SIGMOD [Clifton et al. KDD02] C. Clifton et al. Tools for privacy preserving distributed data mining. In KDD [Anciaux et al. SIGMOD07] N. Anciaux et al. GhostDB: querying visible and hidden data without leaks. In SIGMOD /11/2013ICDE 2013 Tutorial79

References [Chaudhuri et al. CIDR11] S. Chaudhuri et al. Database access control & privacy: is there a common ground? In CIDR [Ristenpart et al. CCS09] T. Ristenpart et al. Hey, you, get off of my cloud: exploring information leakage in third-party compute clouds. In CCS [Somorovsky et al. CCSW11] J. Somorovsky et al. All your clouds are belong to us: security analysis of cloud management interfaces. In CCSW [Hacigümüs et al. ICDE02] H. Hacigümüs et al. Providing database as a service. In ICDE [Song et al. S&P00] D. Song et al. Practical techniques for searches on encrypted data. In S&P [Hacigümüs et al. SIGMOD02] H. Hacigümüs et al. Executing SQL over encrypted data in the database service provider mode. In SIGMOD [Hore et al. VLDB04] B. Hore et al. A privacy-preserving index for range queries. In VLDB [Agrawal et al. SIGMOD04] R. Agrawal et al. Order preserving encryption for numeric data. In SIGMOD /11/2013ICDE 2013 Tutorial80

References [Popa et al. SOSP11] R. A. Popa et al. Cryptdb: protecting confidentiality with encrypted query processing. In SOSP [Damiani et al. CCS03] E. Damiani et al. Balancing confidentiality and efficiency in untrusted relational DBMSs. In CCS [Wang et al. SDM11] S. Wang et al. A comprehensive framework for secure query processing on relational data in the cloud. In SDM [Aggarwal et al. CIDR05] G. Aggarwal et al. Two can keep a secret: a distributed architecture for secure database services. In CIDR [Emekci et al. ICDE06] F. Emekci et al. Privacy preserving query processing using third parties. In ICDE [Agrawal et al. SRDS88] D. Agrawal et al. Quorum consensus algorithms for secure and reliable data. In SRDS [Bajaj et al. SIGMOD11] S. Bajaj et al. Trusteddb: a trusted hardware based database with privacy and data confidentiality. In SIGMOD [Song et al. IEEE12] D. Song et al. Cloud data protection for the masses. In IEEE Computer, 45(1), [Chor et al. JACM98] B. Chor et al. Private information retrieval. In J. ACM, 45(6), /11/2013ICDE 2013 Tutorial81

References [Kushilevitz et al. FOCS97] E. Kushilevitz et al. Replication is not needed: single database, computationally private information retrieval. In FOCS [Sion et al. NDSS07] R. Sion et al. On the computational practicality of private information retrieval. In NDSS [Olumofin et al. FC11] F. G. Olumofin et al. Revisiting the computational practicality of private information retrieval. In FC [Williams et al. NDSS08] P. Williams et al. Usable private information retrieval. In NDSS [Wang et al. DBSEC10] S. Wang et al. Generalizing PIR for practical private retrieval of public data. In DBSec [Wang et al. DAPD13] S. Wang et al. Towards practical private processing of database queries over public data. In DAPD [Vimercati et al. ICDCS11] S. D. C. Vimercati et al. Efficient and private access to outsourced data. In ICDCS /11/2013ICDE 2013 Tutorial82