Database Access Control & Privacy: Is There A Common Ground? Surajit Chaudhuri, Raghav Kaushik and Ravi Ramamurthy Microsoft Research.

Slides:



Advertisements
Similar presentations
CS 540 Database Management Systems
Advertisements

SplitX: High-Performance Private Analytics Ruichuan Chen (Bell Labs / Alcatel-Lucent) Istemi Ekin Akkus (MPI-SWS) Paul Francis (MPI-SWS)
1 Privacy in Microdata Release Prof. Ravi Sandhu Executive Director and Endowed Chair March 22, © Ravi Sandhu.
2009 Architecture Plan Overview 2009 Architecture Plan Overview.
1 Primitives for Workload Summarization and Implications for SQL Prasanna Ganesan* Stanford University Surajit Chaudhuri Vivek Narasayya Microsoft Research.
Chapter 6: Database Evolution Title: AutoAdmin “What-if” Index Analysis Utility Authors: Surajit Chaudhuri, Vivek Narasayya ACM SIGMOD 1998.
Database management concepts Database Management Systems (DBMS) An example of a database (relational) Database schema (e.g. relational) Data independence.
Fundamentals, Design, and Implementation, 9/e Chapter 11 Managing Databases with SQL Server 2000.
Anatomy: Simple and Effective Privacy Preservation Israel Chernyak DB Seminar (winter 2009)
Privacy-Aware Computing Introduction. Outline  Brief introduction Motivating applications Major research issues  Tentative schedule  Reading assignments.
Differential Privacy (2). Outline  Using differential privacy Database queries Data mining  Non interactive case  New developments.
Databases Chapter 11.
1 Towards an end-to-end architecture for handling sensitive data Hector Garcia-Molina Rajeev Motwani and students.
Transaction Processing Systems, & Management Information Systems.
Protecting data privacy and integrity in clouds By Jyh-haw Yeh Computer Science Boise state University.
R 18 G 65 B 145 R 0 G 201 B 255 R 104 G 113 B 122 R 216 G 217 B 218 R 168 G 187 B 192 Core and background colors: 1© Nokia Solutions and Networks 2014.
IMS 4212: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida Distributed Databases Business needs.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
DAY 14: ACCESS CHAPTER 1 Tazin Afrin October 03,
Differential Privacy - Apps Presented By Nikhil M Chandrappa 1.
APPLYING EPSILON-DIFFERENTIAL PRIVATE QUERY LOG RELEASING SCHEME TO DOCUMENT RETRIEVAL Sicong Zhang, Hui Yang, Lisa Singh Georgetown University August.
CS573 Data Privacy and Security Statistical Databases
DBXplorer: A System for Keyword- Based Search over Relational Databases Sanjay Agrawal Surajit Chaudhuri Gautam Das Presented by Bhushan Pachpande.
Automatically Synthesizing SQL Queries from Input-Output Examples Sai Zhang University of Washington Joint work with: Yuyin Sun.
SEC835 Practical aspects of security implementation Part 1.
 Dr. Syed Noman Hasany.  Review of known methodologies  Analysis of software requirements  Real-time software  Software cost, quality, testing and.
Privacy-Aware Personalization for Mobile Advertising
m-Privacy for Collaborative Data Publishing
Differentially Private Data Release for Data Mining Noman Mohammed*, Rui Chen*, Benjamin C. M. Fung*, Philip S. Yu + *Concordia University, Montreal, Canada.
DBXplorer: A System for Keyword- Based Search over Relational Databases Sanjay Agrawal, Surajit Chaudhuri, Gautam Das Cathy Wang
Accuracy-Constrained Privacy-Preserving Access Control Mechanism for Relational Data.
K-Anonymity & Algorithms
© ETH Zürich Eric Lo ETH Zurich a joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich), Tamer Ozsu (U of Waterloo) and Peter.
Introduction – Addressing Business Challenges Microsoft® Business Intelligence Solutions.
Database Management System (DBMS) an Introduction DeSiaMore 1.
Implementing Differential Privacy & Side-channel attacks CompSci Instructor: Ashwin Machanavajjhala 1Lecture 14 : Fall 12.
Chapter No 4 Query optimization and Data Integrity & Security.
The Sparse Vector Technique CompSci Instructor: Ashwin Machanavajjhala 1Lecture 12 : Fall 12.
A Robust, Optimization-Based Approach for Approximate Answering of Aggregate Queries Surajit Chaudhuri Gautam Das Vivek Narasayya Presented by Sushanth.
Preservation of Proximity Privacy in Publishing Numerical Sensitive Data J. Li, Y. Tao, and X. Xiao SIGMOD 08 Presented by Hongwei Tian.
INFO1408 Database Design Concepts Week 15: Introduction to Database Management Systems.
Microsoft Access 2013 ®® Tutorial 9 Using Action Queries and Advanced Table Relationships.
1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based.
Chapter 3 Databases and Data Warehouses: Building Business Intelligence Copyright © 2010 by the McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin.
Differential Privacy Some contents are borrowed from Adam Smith’s slides.
Data and Applications Security Developments and Directions Dr. Bhavani Thuraisingham The University of Texas at Dallas Inference Problem - I September.
Information Systems Today: Managing in the Digital World TB3-1 3 Technology Briefing Database Management “Modern organizations are said to be drowning.
Presented By Anirban Maiti Chandrashekar Vijayarenu
m-Privacy for Collaborative Data Publishing
Chapter 5 : Integrity And Security  Domain Constraints  Referential Integrity  Security  Triggers  Authorization  Authorization in SQL  Views 
Differential Privacy Xintao Wu Oct 31, Sanitization approaches Input perturbation –Add noise to data –Generalize data Summary statistics –Means,
Private Release of Graph Statistics using Ladder Functions J.ZHANG, G.CORMODE, M.PROCOPIUC, D.SRIVASTAVA, X.XIAO.
Chapter 9: Web Services and Databases Title: NiagaraCQ: A Scalable Continuous Query System for Internet Databases Authors: Jianjun Chen, David J. DeWitt,
CS 540 Database Management Systems
Secure Data Outsourcing
Yang, et al. Differentially Private Data Publication and Analysis. Tutorial at SIGMOD’12 Part 4: Data Dependent Query Processing Methods Yin “David” Yang.
Lecture 15: Query Optimization. Very Big Picture Usually, there are many possible query execution plans. The optimizer is trying to chose a good one.
Privacy-safe Data Sharing. Why Share Data? Hospitals share data with researchers – Learn about disease causes, promising treatments, correlations between.
Big Data Analytics Are we at risk? Dr. Csilla Farkas Director Center for Information Assurance Engineering (CIAE) Department of Computer Science and Engineering.
Space for things we might want to put at the bottom of each slide. Part 6: Open Problems 1 Marianne Winslett 1,3, Xiaokui Xiao 2, Yin Yang 3, Zhenjie Zhang.
Sergey Yekhanin Institute for Advanced Study Lower Bounds on Noise.
A hospital has a database of patient records, each record containing a binary value indicating whether or not the patient has cancer. -suppose.
Privacy Issues in Graph Data Publishing Summer intern: Qing Zhang (from NC State University) Mentors: Graham Cormode and Divesh Srivastava.
Privacy in Mobile Systems Karthik Dantu and Steve Ko.
Privacy-preserving Release of Statistics: Differential Privacy
Inference and Flow Control
Redundancy and Information Leakage in Fine Grained Access Control
SQL Fundamentals in Three Hours
A Framework for Testing Query Transformation Rules
Presentation transcript:

Database Access Control & Privacy: Is There A Common Ground? Surajit Chaudhuri, Raghav Kaushik and Ravi Ramamurthy Microsoft Research

Data Privacy  Databases Have Sensitive Information  Health care database: Patient PII, Disease information  Sales database: Customer PII  Employee database: Employee level, salary  Data analysis carries the risk of privacy breach [FTDB 2009]  Latanya Sweeney’s identification of the governor of MA from medical records  AOL search logs  Netflix prize dataset  Focus of this paper: What is the implication of data privacy concerns on the DBMS? Do we need any more than access control? 2

Data Publishing 3 NameAgeGenderZipcodeDisease Ann28F13068Heart disease Bob21M13068Flu Carol24F13068Viral disease …………… Patients [FTDB2009] AgeGenderZipcodeDisease [20-29]F1****Heart disease [20-29]M1****Flu [20-29]F1****Viral disease ………… Patients-Anonymized Q1 Qn... K-Anonymity, L-Diversity, T-Closeness

Privacy-Aware Query Answering 4 NameAgeGenderZipcodeDisease Ann28F13068Heart disease Bob21M13068Flu Carol24F13068Viral disease …………… Patients [FTDB2009] AgeGenderZipcodeDisease [20-29]F1****Heart disease [20-29]M1****Flu [20-29]F1****Viral disease ………… Patients-Anonymized Q1 Qn... Differential Privacy, Privacy-Preserving OLAP

Data Publishing Vs Query Answering 5  Jury is still out  Data Publishing  No impact on DBMS  De-identification algorithms over published data are getting increasingly sophisticated  Need to take a hard look at the query answering paradigm  Potential implications for DBMS  “An interactive, query-based approach is generally superior from the privacy perspective to the “release-and-forget” approach” [CACM’10]

Is “Privacy-Aware” = (Fine-Grained) Access Control (FGA)? 6  Every user is allowed to view only subset of data (authorization view)  Subset defined using a predicate  Queries are (logically) rewritten to go against subset Select * From Patients Where Patients.Physician = userID()

Is “Privacy-Aware” = (Fine-Grained) Access Control (FGA)? 7  Every user is allowed to view only subset of data (authorization view)  Subset defined using a predicate  Queries are (logically) rewritten to go against subset Select Drug, count(*) From Patients right outer join Drugs on Drug Where (Select count(*) From Side-Effects Where Drug = Drugs.Drug) > 3 Group by Drug Select Drug, count(*) From Patients right outer join Drugs on Drug Where (Select count(*) From Side-Effects Where Drug = Drugs.Drug and auth(Side-Effects)) > 3 and auth(Patients) and auth(Drugs) Group by Drug

Authorization is “Black and White” 8  Query: Count the number of cancer patients Utility Privacy Grant access to cancer patients (Return accurate count) Deny access to cancer patients

Beyond “Black and White”: Differential Privacy [SIGMOD09] 9 Perturb the output of agg. computation (Requires no change in execution engine) Need to set parameters ε, Budget Count the number of cancer patients Baggage Non-deterministic Per-query privacy parameter Overall privacy budget

Seeking Common Ground  Access Control  Supports full generality of SQL  “Black and White”  Differential Privacy Algorithms  A principled way to go beyond “black and white”  Known mechanisms do not support full generality of SQL  Data analysis involves aggregation but also joins, sub-queries  Can we get the best of both worlds?  Differential Privacy = Computation on unauthorized data  What is the implication on privacy guarantees? 10

What Does “Best of Both Worlds” Look Like?  FGA Policy:  Each physician can see:  Records of their patients  Analyst can see:  Drug records manufactured by their employer  No patient records NameDiseaseDrugPhysician AnnHeart disease LipitorGrey ………… DrugCompany LipitorPfizer …… Patients DrugSide- Effect LipitorMuscle LipitorLiver …… DrugsSide-Effects NameEmployer JoeAnalystPfizer JaneAnalystMerck …… Analysts

FGA 12 NameDiseaseDrugPhysician ………Grey ……… ………Stevens ……… ………Yang Select * From Patients Select * From Patients Where Physician = userID() Grey

Differential Privacy 13 NameDiseaseDrugPhysician …Heart Disease …… …Flu…… …Cancer…… … …… …AIDS…… Select count(*) From Patients Where Disease = ‘Cancer’ Select count(*) + Noise From Patients Where Disease = ‘Cancer’ User = JaneAnalyst

Mix And Match: FGA + Differential Privacy 14  Find for each drug with more than 3 side- effects, count the number of patients who have been prescribed Select Drug, count(*) From Patients right outer join Drugs on Drug Where (Select count(*) From Side-Effects Where Drug = Drugs.Drug) > 3 Group by Drug NameDiseaseDrugPhysician ………… ………… DrugCompany LipitorPfizer …… Patients DrugSide- Effect LipitorMuscle LipitorLiver …… DrugsSide-Effects NameEmployer JoeAnalystPfizer JaneAnalystMerck …… Analysts

Architecture That Will Fail To Mix And Match 15 Execution Engine Authorization Subsystem Q Policy Result(AggQ) Results Differential Privacy API AggQ Result(AggQ) + Noise DBMS

16 Execution Engine Authorization Subsystem Q Policy Result(AggQ) Results Differential Privacy API AggQ Result(AggQ) + Noise DBMS Wrapper Architecture That Will Fail To Mix And Match

Authorization-Aware Data Privacy 17 Execution Engine Authorization Aware Privacy Subsystem Q Policy Results DBMS

Query Rewriting 18 Select Drug, count(*) From Patients right outer join Drugs on Drug Where (Select count(*) From Side-Effects Where Drug = Drugs.Drug) > 3 Group by Drug NameDiseaseDrugPhysician ………… ………… DrugCompany LipitorPfizer …… Patients DrugSide- Effect LipitorMuscle LipitorLiver …… DrugsSide-Effects NameEmployer JoeAnalystPfizer JaneAnalystMerck …… Analysts Non-aggregation: Authorization What about aggregation?

Query Rewriting 19 Select Drug, count(*) From Patients right outer join Drugs on Drug Where (Select count(*) From Side-Effects Where Drug = Drugs.Drug) > 3 Group by Drug NameDiseaseDrugPhysician ………… ………… DrugCompany LipitorPfizer …… Patients DrugSide- Effect LipitorMuscle LipitorLiver …… DrugsSide-Effects NameEmployer JoeAnalystPfizer JaneAnalystMerck …… Analysts

Query Rewriting 20 Select Drug, count(*) From Patients right outer join Drugs on Drug Where (Select count(*) From Side-Effects Where Drug = Drugs.Drug and auth(Side-Effects)) > 3 and auth(Patients) and auth(Drugs) Group by Drug NameDiseaseDrugPhysician ………… ………… DrugCompany LipitorPfizer …… Patients DrugSide- Effect LipitorMuscle LipitorLiver …… DrugsSide-Effects NameEmployer JoeAnalystPfizer JaneAnalystMerck …… Analysts Authorized Groups For each authorized group, find noisy count

Query Rewriting 21 Select Drug, count(*) From Patients right outer join Drugs on Drug Where (Select count(*) From Side-Effects Where Drug = Drugs.Drug and auth(Side-Effects)) > 3 and auth(Patients) and auth(Drugs) Group by Drug NameDiseaseDrugPhysician ………… ………… DrugCompany LipitorPfizer …… Patients DrugSide- Effect LipitorMuscle LipitorLiver …… DrugsSide-Effects NameEmployer JoeAnalystPfizer JaneAnalystMerck …… Analysts Authorized Groups For each authorized group, find: (1)Noisy count on unauthorized subset (2)Accurate count on authorized subset

Class of Queries 22 Select Drug, count(*) From Patients right outer join Drugs on Drug Where (Select count(*) From Side-Effects Where Drug = Drugs.Drug) > 3 Group by Drug Foreign key join Predicate Grouping Aggregation  Rewriting: Go to unauthorized data for final aggregation  Principled rewriting for arbitrary SQL: open problem

Our Privacy Guarantee: Relative Differential Privacy 23  Differential Privacy Intuition:  A computation is differentially private if its behavior is similar for any two databases D1and D2 that differ in a single record  Relative Differential Privacy Intuition:  A computation is differentially private relative to an authorization policy if its behavior is similar for any two databases D1and D2 that differ in a single record and both result in the same authorization views

Noisy View 24 Create noisy view DrugCounts(Drug, PatientCnt) as (Select Drug, count(*) From Patients right outer join Drugs on Drug Where (Select count(*) From Side-Effects Where Drug = Drugs.Drug) > 3 Group by Drug)  Named  Non-deterministic  Rewriting is authorization aware  Can be part of grant-revoke statements just like regular views

Noisy View Examples 25 Select count(*) From Patients Where Disease = ‘Cancer’ Select Disease, count(*) From Patients Group by Disease Select Category, count(*) From Patients join DiseaseCategory on Disease Group by Category

Noisy View Architecture 26 Execution Engine Authorization Aware Privacy Subsystem Q Policy Results Tables Noisy Views Views Enforce authorization Rewrite as we saw before Select Drug, Side-Effect, Cnt From DrugCounts, Side-Effects Where DrugCounts.Drug = Side-Effects.Drug DBMS

Differential Privacy Parameters [SIGMOD09] 27 Need to set parameters ε, Budget

Noisy View Architecture: Differential Privacy Parameters 28 Execution Engine Authorization Aware Privacy Subsystem (Q, ε) Auth. Policy, Privacy Budget Results Tables Noisy Views Views Fall back to access control after budget exhausted DBMS

Conclusions and Future Work 29  Noisy view based architecture to incorporate privacy- preserving query answering with access control in a DBMS  Based on differential privacy  Needs minimal changes to engine  Guarantee: Differential privacy relative to authorizations  Baggage of differential privacy  Non-deterministic  Per-query privacy parameter  Overall privacy budget  Open Issues  Larger class of noisy views (can we support arbitrary SQL?)  Benchmark the privacy-utility tradeoff for complex data analysis, e.g. TPC-H, TPC-DS.  Query Optimization  Integrating Access Control with other privacy models