Presentation is loading. Please wait.

Presentation is loading. Please wait.

Towards Robustness in Query Auditing Shubha U. Nabar Stanford University VLDB 2006 Joint Work With B. Marthi, K. Kenthapadi, N. Mishra, R. Motwani.

Similar presentations


Presentation on theme: "Towards Robustness in Query Auditing Shubha U. Nabar Stanford University VLDB 2006 Joint Work With B. Marthi, K. Kenthapadi, N. Mishra, R. Motwani."— Presentation transcript:

1 Towards Robustness in Query Auditing Shubha U. Nabar Stanford University VLDB 2006 Joint Work With B. Marthi, K. Kenthapadi, N. Mishra, R. Motwani

2 Data Mining vs Privacy Large amount of data available in digital form Statisticians query data to mine useful trends Potential for privacy breaches

3 Online Query Auditing Given a stream of queries over a DB containing private information, when should queries be denied to protect privacy? Our focus:  Statistical DBs: census, hospital, employee  Only one private attribute, e.g., salary, disease  Statistical queries over private attribute: sum, max, mean  Stream of queries of single type from single user

4 Online Query Auditing Company Database Name Age Sex Salary Alice 23 F 42K Bob 25 M 50K Carl 30 M 80K Dave 21 M 35K Sum of salaries of female employees 42,000 Adversary Alice’s salary = $42,000!

5 Online Query Auditing In general, more complex queries can be posed and answers put together to deduce information Task of auditor: deny query when answer to current and past queries can be “stitched together” to leak information.

6 Our Contributions Auditor for max queries Auditor for combinations of max and min queries A first analysis of the utility of an auditing scheme

7 Related Work Perturbing data itself [W ‘65, AS ‘00, EGS ’03, CDMSW ‘05] Perturbing results supplied to user [DN ‘03, DMNS ‘06] Statisticians unhappy with addition of noise Auditors provide exact answers if at all

8 Previous Work Restricting Size and Overlap of Queries [Dobkins, Jones, Lipton ‘79] Offline Auditing [Chin ‘86] Auditing for Boolean Attributes [Kleinberg, Papadimitriou, Raghavan ‘03] Auditing Compliance with a Hippocratic Database [Agrawal, Bayardo, Faloutsos, Kiernan, Rantzau, Srikant ’04] Simulatable Auditing [Kenthapadi, Mishra, Nissim ‘05]

9 Naïve Auditor If answer to current query causes an element to be determined, deny Adversary Company Database Alice 23 F 42K Bob 25 M 50K Carl 30 M 80K Dave 21 M 35K max salary{Alice,Bob,Carl} 80,000 max salary{Alice,Bob} denied Carl’s salary = $80,000! Name Age Sex Salary

10 Simulatability Denials based on answer to current query may cause privacy breach Solution: If attacker can simulate and predict decision to deny ) denials do not leak information Auditor: If there is any dataset consistent with past answers in which current query causes breach, deny  Attacker can check condition himself  Denials do not leak information

11 Goal Find online, efficient, simulatable, high-utility auditors for various classes of queries

12 Definition of Privacy Breach Full Disclosure: some private data point can be uniquely determined  e.g. max{x a, x b, x c } = 10 max{x a, x b } = 8 ) x c = 10 Partial Disclosure (probabilistic compromise): significant change in attacker’s confidence about some private data value

13 Probabilistic Compromise Private data known to be drawn according to D Range of each data point divided in to intervals SDB qtqt atat 0101 query PriorPosterior

14 Outline Problem Statement Previous Work Auditing Max Queries Auditing Max and Min Queries Utility Future Work See paper for auditing against full disclosure

15 Skeleton of Probabilistic Auditor 1.Attacker poses query q t 2.Attacker has posterior distribution over answer to q t, given previous answers 3.Auditor repeatedly: a.Samples possible answer from this distribution b.Checks if sampled answer will change attacker’s belief about some data point 4.If q t “unsafe” in significant fraction of samples, deny Need to estimate posterior distributions in 2. and 3b.

16 Probabilistic Max Auditor Assumption: dataset drawn uniformly at random from set of duplicate-free points in [ ,  ] n  For each x i and any interval in [α,  ] prior prob uniform Given answers to set of queries, what are posterior probabilities?

17 Probabilistic Max Auditor Given queries q 1 …q t and answers a 1 …a t create synopsis B max B max contains predicates [max(S 1 ) = a 1 ], [max(S 2 ) < a 2 ]… S i s are disjoint B max enables succinct representation of audit trail B max enables computation of posterior probabilities

18 Determining Posterior Probabilities max{x a, x b, x c } = 0.75 xaxa xbxb xcxc (0.75, 0, 0) (0, 0.75, 0) (0, 0, 0.75) Pr{x a 2 [0,0.25]} Pr{x a 2 [0.25,0.5]} Pr{x a 2 [0.5,0.75)} Pr{x a = 0.75} Pr{x a = 0.75} = 1/3, since any one of x a, x b or x c is equally likely to be max With remaining 2/3 probability, x a is uniformly distributed in [0,0.75)

19 Probabilistic Max Auditor 1.Attacker poses query q t 2.Attacker has posterior distribution over answer to q t, given previous answers 3.Auditor repeatedly: a.Samples possible answer from this distribution b.Checks if sampled answer will change attacker’s belief about some data point 4.If q t “unsafe” in significant fraction of samples, deny Can give guarantees on probability that adversary learns new information

20 Outline Problem Statement Previous Work Auditing Max Queries Auditing Max and Min Queries Utility Future Work

21 Probabilistic Max-and-Min Auditor Computing posterior probabilities becomes harder Given queries, create synopsis so that a data point occurs in at most one max and one min predicate

22 Equivalent Graph Coloring Problem max{x a, x b, x c } = 1 min{x a, x b } = 0.2 max{x d, x e } = 2min{x c, x d, x e } = 0.5 a, b, c a, b d, e c, d, e Every valid coloring corresponds to a set of consistent datasets

23 Probabilistic Max-and-Min Auditor We show  Can sample consistent dataset according to posterior distribution by sampling valid coloring according to distribution P  Can sample valid coloring according to P using markov chain over colorings  Can use sampled colorings to answer questions about posterior distribution of data points up to arbitrary precision See paper for details

24 Outline Problem Statement Previous Work Auditing Max Queries Auditing Max and Min Queries Utility Future Work

25 Utility Several dimensions of utility:  How many queries are answered?  What kinds of queries are answered?  What can be computed?  “Price of simulatability” Expected time to first denial

26 Utility of Sum Auditor Consider full disclosure No prior knowledge – data points come from unbounded range Queries chosen uniformly at random

27 Sum Auditor 1 0 1 0 1 1 1 0 0 0 1 0 0 0 1 1 1 0 0 1 xaxa xbxb xcxc xdxd xexe = a1a1 a2a2 a3a3 a4a4 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 xaxa xbxb xcxc xdxd xexe = a 2 - a 4 + a 3 a 4 – a 3 a 1 - a 3 a 4 - a 2

28 Utility of Sum Auditor We show, expected time to first denial  ¸ n/4  · n + lgn Good news for large databases – answers not riddled with denials Can’t do much better Once n-1 independent queries are answered, at least half the queries will be denied on average

29 Utility of Sum Auditor Reality  Users do not choose queries uniformly at random  Users cannot query arbitrary subsets of the data  Database frequently updated – old information becomes irrelevant e.g. q 1 = x a + x b + x c ; x a is modified q 2 = x a + x b q 2 will no longer be denied Denials may not be so frequent in reality

30 Utility: Experiments Plot 1: Sum queries chosen uniformly at random Plot 2: Sum queries with updates Plot 3: 1 dimensional range sum queries

31 Future Work Ways to proactively enhance utility  Deny innocuous queries in the present in the hope that more can be answered in the future Ward off denial of service attacks Devise auditors, study utility for more complex queries Remove assumptions about prior knowledge Solution to collusion


Download ppt "Towards Robustness in Query Auditing Shubha U. Nabar Stanford University VLDB 2006 Joint Work With B. Marthi, K. Kenthapadi, N. Mishra, R. Motwani."

Similar presentations


Ads by Google