Online Auditing Kobbi Nissim Microsoft Based on a position paper with Nina Mishra.

Slides:



Advertisements
Similar presentations
Access 2007 ® Use Databases How can Microsoft Access 2007 help you structure your database?
Advertisements

Three Special Functions
Wavelet and Matrix Mechanism CompSci Instructor: Ashwin Machanavajjhala 1Lecture 11 : Fall 12.
Differentially Private Recommendation Systems Jeremiah Blocki Fall A: Foundations of Security and Privacy.
Simulatability “The enemy knows the system”, Claude Shannon CompSci Instructor: Ashwin Machanavajjhala 1Lecture 6 : Fall 12.
Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services Authored by: Seth Gilbert and Nancy Lynch Presented by:
Effective Keyword Based Selection of Relational Databases Bei Yu, Guoliang Li, Karen Sollins, Anthony K.H Tung.
Top k Knapsack Joins and Closure Early Results Witold LITWIN & Thomas Schwarz U. Paris Dauphine, France
PAPER BY : CHRISTOPHER R’E NILESH DALVI DAN SUCIU International Conference on Data Engineering (ICDE), 2007 PRESENTED BY : JITENDRA GUPTA.
Composition CMSC 202. Code Reuse Effective software development relies on reusing existing code. Code reuse must be more than just copying code and changing.
Information System Audit : © South-Asian Management Technologies Foundation Chapter 4: Information System Audit Requirements.
Efficient Query Evaluation on Probabilistic Databases
Online Auditing - How may Auditors Inadvertently Compromise Your Privacy Kobbi Nissim Microsoft With Nina Mishra HP/Stanford Work in progress.
Monday, 08 June 2015Dr. Mohamed Osman1 What is Database Administration A high level function (technical Function) that is responsible for ► physical DB.
1 CSE 417: Algorithms and Computational Complexity Winter 2001 Lecture 21 Instructor: Paul Beame.
Evidence Based Management Creating Acceptance of Evidence Based Management.
Security in Databases. 2 Srini & Nandita (CSE2500)DB Security Outline review of databases reliability & integrity protection of sensitive data protection.
Privacy without Noise Yitao Duan NetEase Youdao R&D Beijing China CIKM 2009.
Security in Databases. 2 Outline review of databases reliability & integrity protection of sensitive data protection against inference multi-level security.
Robust Bayesian Classifier Presented by Chandrasekhar Jakkampudi.
Social Statistics S519: Evaluation of Information Systems.
© 2014 Cengage Learning. All Rights Reserved. Learning Objectives © 2014 Cengage Learning. All Rights Reserved. LO6 Record closing entries for a merchandising.
Dana Moshkovitz, MIT Joint work with Subhash Khot, NYU.
Auditing Batches of SQL Queries Rajeev Motwani Shubha Nabar Dilys Thomas Stanford University.
February 17, 2015Applied Discrete Mathematics Week 3: Algorithms 1 Double Summations Table 2 in 4 th Edition: Section th Edition: Section th.
Hashed Samples Selectivity Estimators for Set Similarity Selection Queries.
1 What NOT to do I get sooooo Frustrated! Marking the SAME wrong answer hundreds of times! I will give a list of mistakes which I particularly hate marking.
Statistical Databases – Query Auditing Li Xiong CS573 Data Privacy and Anonymity Partial slides credit: Vitaly Shmatikov, Univ Texas at Austin.
CS573 Data Privacy and Security Statistical Databases
Teaching Students to Prove Theorems Session for Project NExT Fellows Presenter: Carol S. Schumacher Kenyon College and.
Page 1March 3, th Estonian Winter School in Computer Science Privacy Preserving Data Mining Lecture 3 Non-Cryptographic Approaches for Preserving.
Stochastic Protection of Confidential Information in SDB: A hybrid of Query Restriction and Data Perturbation ( to appear in Operations Research) Manuel.
Disclosure risk when responding to queries with deterministic guarantees Krish Muralidhar University of Kentucky Rathindra Sarathy Oklahoma State University.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
Efficient Subwindow Search: A Branch and Bound Framework for Object Localization ‘PAMI09 Beyond Sliding Windows: Object Localization by Efficient Subwindow.
Uniform discretizations: the continuum limit of consistent discretizations Jorge Pullin Horace Hearne Institute for Theoretical Physics Louisiana State.
Security Control Methods for Statistical Database Li Xiong CS573 Data Privacy and Security.
Additive Data Perturbation: the Basic Problem and Techniques.
BEHAVIORAL TARGETING IN ON-LINE ADVERTISING: AN EMPIRICAL STUDY AUTHORS: JOANNA JAWORSKA MARCIN SYDOW IN DEFENSE: XILING SUN & ARINDAM PAUL.
1 Publishing Naive Bayesian Classifiers: Privacy without Accuracy Loss Author: Barzan Mozafari and Carlo Zaniolo Speaker: Hongwei Tian.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 7.
Differential Privacy Some contents are borrowed from Adam Smith’s slides.
Security Methods for Statistical Databases. Introduction  Statistical Databases containing medical information are often used for research  Some of.
Towards Robustness in Query Auditing Shubha U. Nabar Stanford University VLDB 2006 Joint Work With B. Marthi, K. Kenthapadi, N. Mishra, R. Motwani.
Auditing Information Leakage for Distance Metrics Yikan Chen David Evans TexPoint fonts used in EMF. Read the TexPoint manual.
Offering a Precision- Performance Tradeoff for Aggregation Queries over Replicated Data Paper by Chris Olston, Jennifer Widom Presented by Faizaan Kersi.
An Algorithm for the Consecutive Ones Property Claudio Eccher.
Ch03-Algorithms 1. Algorithms What is an algorithm? An algorithm is a finite set of precise instructions for performing a computation or for solving a.
Lecture 9: Query Complexity Tuesday, January 30, 2001.
Reconciling Confidentiality Risk Measures from Statistics and Computer Science Jerry Reiter Department of Statistical Science Duke University.
EECS David C. Chan1 Computer Security Management Session 1 How IT Affects Risks and Assurance.
Chapter 10 NP-Complete Problems.
Recommendation in Scholarly Big Data
University of Texas at El Paso
Clustering Data Streams
DATE RAPE REFLECTION Take out a ½ sheet of paper & write your name & period on the top. Write for 4 MINUTES about what you have taken away from our date.
Complexity analysis.
Processing Integrity and Availability Controls
NP-Completeness Yin Tat Lee
Inference and Flow Control
Sidharth Mishra Dr. T.Y. Lin CS 257 Section 1 MH 222 SJSU - Fall 2016
Objective of This Course
DMA ~ August 31, 2017~ Aspirations
Lecture 10: Query Complexity
Outline Motivation Access Control Matrix Model
CS 188: Artificial Intelligence Fall 2008
NP-Completeness Yin Tat Lee
ACCOUNTING II Chapter 20 Assignment Sheet
CS639: Data Management for Data Science
Differential Privacy (1)
Presentation transcript:

Online Auditing Kobbi Nissim Microsoft Based on a position paper with Nina Mishra

2 The Setting Dataset: {d 1,…,d n } –Entries d i : Real, Integer, Boolean Query: q = (f,i 1,…,i k ) –f : Min, Max, Median, Sum, Average, Count… Some users are bad… Statistical database f (d i1,…,d ik ) q = (f,i 1,…,i k )

3 Auditing Statistical database Query log q 1,…,q i Here’s a new query: q i+1 Here’s the answer Query denied (as the answer would cause privacy loss) OR Auditor

4 Auditing [Adam, Wortmann 89] classify auditing as a query restriction method –Such methods limit the queries users may post, usually imposing some structure (e.g. combinatorial, algebraic) –“Auditing of an SDB involves keeping up-to-date logs of all queries made by each user (not the data involved) and constantly checking for possible compromise whenever a new query is issued” Partial motivation: May allow for more queries to be posed, if no privacy threat occurs Early work: Hofmann 1977, Schlorer 1976, Chin, Ozsoyoglu 1981, 1986 Recent interest: Kleinberg, Papadimitriou, Raghavan 2000, Li, Wang, Wang, Jajodia 2002, Jonsson, Krokhin 2003

5 Design choices in Prior Work Out of the scope for this talk (but important): –Very weak privacy guarantee: Privacy breached (only) when a database entry may be uniquely deduced –Exact answers given Important for this talk: –Data taken into account in decision procedure Answers to q 1,…,q i and q i+1 taken into account Denials ignored

6 Some Prior Work on Auditors DataQueriesBreachComplexity Sum/Max [Chin] realSum/maxd i learnedNP-hard Boolean [KPR00] 0/1Sum--”--NP-hard Max [KPR00]RealMax--”--PTIME Interval based [LWWJ02] d i  [a,b] sumd i within accuracy . PTIME Generalized results [JK03] NP-hard / PTIME

7 Example 1: Sum/Max auditing Oh well… q1 = sum(d1,d2,d3) sum(d1,d2,d3) = 15 q2 = max(d1,d2,d3) Denied (the answer would cause privacy loss) q2 is denied iff d1=d2=d3 = 5 I win! d i real, sum/max queries Auditor

8 Example 2: Interval Based Auditing q1 = sum(d1,d2) Sorry, denied q2 = sum(d2,d3) sum(d2,d3) = 50 d1,d2  [0,1] d3  [49,50] d i  [0,100], sum queries,  =1 (PTIME) Auditor

9 Sounds Familiar? On the advice of my counsel I respectfully and regretfully decline to answer the question based on my constitutional rights. Colonel Oliver North, on the Iran-Contra Arms Deal: Mr. Chairman, I would like to answer the committee's questions, but on the advice of my counsel I respectfully decline to answer the question based on the protection afforded me under the Constitution of the United States. David Duncan, Former auditor for Enron and partner in Andersen:

10 What about Max Auditing? q1 = max(d1,d2,d3,d4) M 1234 d i real M 123 / denied If denied: d4=M 1234 M 12 / denied If denied: d3=M 123 Recover 1/8 of the database! Auditor q2 = max(d1,d2,d3) q2 = max(d1,d2) d1d2d4d6d3d5d7d8…dndn d n-1

11 What about Boolean Auditing? 1 / denied q i denied iff d i = d i+1  learn database/complement Auditor … 1 / 2 Recover the entire database! Let d i,d j,d k not all equal, where q i-1, q i, q j-1, q j, q k-1, q k all denied d i Boolean d1d2d4d6d3d5d7d8…dndn d n-1 q1 = sum(d1,d2) q2=sum(d2,d3) q2=sum(d i,d j,d k )

12 What are the Problems? Obvious problem: denied queries ignored –Algorithmic problem: not clear how to incorporate denials in the deicion Subtle problem: –Query denials leak (potentially sensitive) information Users cannot decide denials by themselves Possible assignments to {d 1,…,d n } Assignments consistent with (q 1,…q i ) q i+1 denied

13 Sum/Max, Interval based, Boolean, Max Cell suppression k-anonimity q 1,…,q i, q i+1 a 1,…,a i, a i+1 q 1,…,q i, q i+1 a 1,…,a i A Spectrum of Auditors Size overlap restriction Algebraic structure q 1,…,q i, q i+1 ExamplesDecision data “safe” “unsafe” *Note: can work in “unsafe” region, but need to prove denials do not leak crucial information

14 Simulatable Auditing* An auditor is simulatable if a simulator exists s.t.: Auditor q i+1  Deny/answer Simulator Simulation  denials do not leak information * `self auditors’ in [DN03] q 1,…,q i a 1,…,a i Statistical database q 1,…,q i

15 Summary Subtleties in current definition of auditors allow for information leakage, and potentially, privacy breaches –Denials are not taken into account –Auditor uses information not available to user Simulatable auditors provably don’t leak information in decision –New starting point for research on auditors

16 A Spectrum of Auditors Sum/Max, Interval based, Boolean, Max q 1,…,q i, q i+1 a 1,…,a i, a i+1 Size overlap restriction Algebraic structure q 1,…,q i, q i+1 ExamplesDecision data Cell Suppression k-anonimity

17 Sounds Familiar? On the advice of my counsel I respectfully and regretfully decline to answer the question based on my constitutional rights. Colonel Oliver North, on the Iran-Contra Arms Deal: I would like to answer the committee's questions, but on the advice of my counsel, I respectfully decline to answer the questions based on the protection afforded me under the Constitution of the United States. Andrew Fastow, CFO, Enron Corporation: