Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala cs.duke.edu Collaborators: Daniel Kifer (PSU),

Similar presentations


Presentation on theme: "Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala cs.duke.edu Collaborators: Daniel Kifer (PSU),"— Presentation transcript:

1 Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala ashwin @ cs.duke.edu Collaborators: Daniel Kifer (PSU), Bolin Ding (MSR), Xi He (Duke) 1Summer @ Census, 8/15/2013

2 Overview of the talk An inherent trade-off between privacy (confidentiality) of individuals and utility of statistical analyses over data collected from individuals. Differential privacy has revolutionized how we reason about privacy – Nice tuning knob ε for trading off privacy and utility Summer @ Census, 8/15/20132

3 Overview of the talk However, differential privacy only captures a small part of the privacy-utility trade-off space – No Free Lunch Theorem – Differentially private mechanisms may not ensure sufficient utility – Differentially private mechanisms may not ensure sufficient privacy Summer @ Census, 8/15/20133

4 Overview of the talk I will present a new privacy framework that allows data publishers to more effectively tradeoff privacy for utility – Better control on what to keep secret and who the adversaries are – Can ensure more utility than differential privacy in many cases – Can ensure privacy where differential privacy fails Summer @ Census, 8/15/20134

5 Outline Background – Differential privacy No Free Lunch [Kifer-M SIGMOD ’11] – No `one privacy notion to rule them all’ Pufferfish Privacy Framework [Kifer-M PODS’12] – Navigating the space of privacy definitions Blowfish: Practical privacyusing policies [ongoing work] Summer @ Census, 8/15/20135

6 Data Privacy Problem 6 Individual 1 r1r1 Individual 2 r2r2 Individual 3 r3r3 Individual N rNrN Server DBDB Utility: Privacy: No breach about any individual Utility: Privacy: No breach about any individual Summer @ Census, 8/15/2013

7 Data Privacy in the real world Summer @ Census, 8/15/20137 ApplicationData CollectorThird Party (adversary) Private Information Function (utility) MedicalHospitalEpidemiologistDiseaseCorrelation between disease and geography Genome analysis HospitalStatistician/ Researcher GenomeCorrelation between genome and disease AdvertisingGoogle/FB/Y!AdvertiserClicks/Brows ing Number of clicks on an ad by age/region/gender … Social Recommen- dations FacebookAnother userFriend links / profile Recommend other users or ads to users based on social network

8 T-closeness Li et. al ICDE ‘07 K-Anonymity Sweeney et al. IJUFKS ‘02 Many definitions Linkage attack Background knowledge attack Minimality /Reconstruction attack de Finetti attack Composition attack Summer @ Census, 8/15/2013 8 L-diversity Machanavajjhala et. al TKDD ‘07 E-Privacy Machanavajjhala et. al VLDB ‘09 & several attacks Differential Privacy Dwork et. al ICALP ‘06

9 Differential Privacy For every output … OD2D2 D1D1 Adversary should not be able to distinguish between any D 1 and D 2 based on any O Pr[A(D 1 ) = O] Pr[A(D 2 ) = O]. Adversary should not be able to distinguish between any D 1 and D 2 based on any O Pr[A(D 1 ) = O] Pr[A(D 2 ) = O]. For every pair of inputs that differ in one value 0) log 9Summer @ Census, 8/15/2013

10 Algorithms No deterministic algorithm guarantees differential privacy. Random sampling does not guarantee differential privacy. Randomized response satisfies differential privacy. Summer @ Census, 8/15/201310

11 Laplace Mechanism Database D Researcher Query q True answer q(D) q(D) + η η h(η) α exp(-η / λ) Privacy depends on the λ parameter Mean: 0, Variance: 2 λ 2 11Summer @ Census, 8/15/2013

12 Laplace Mechanism Thm: If sensitivity of the query is S, then the following guarantees ε- differential privacy. λ = S/ε Sensitivity: Smallest number s.t. for any D,D’ differing in one entry, || q(D) – q(D’) || 1 ≤ S(q) 12Summer @ Census, 8/15/2013 [Dwork et al., TCC 2006]

13 Contingency tables 13 22 28 D Count(, ) Each tuple takes k=4 different values Summer @ Census, 8/15/2013

14 Laplace Mechanism for Contingency Tables 14 2 + Lap(2/ε) 8 + Lap(2/ε) D Mean : 8 Variance : 8/ε 2 Sensitivity = 2 Summer @ Census, 8/15/2013

15 Composition Property If algorithms A 1, A 2, …, A k use independent randomness and each A i satisfies ε i -differential privacy, resp. Then, outputting all the answers together satisfies differential privacy with ε = ε 1 + ε 2 + … + ε k Summer @ Census, 8/15/201315 Privacy Budget

16 Differential Privacy Privacy definition that is independent of the attacker’s prior knowledge. Tolerates many attacks that other definitions are susceptible to. – Avoids composition attacks – Claimed to be tolerant against adversaries with arbitrary background knowledge. Allows simple, efficient and useful privacy mechanisms – Used in LEHD’s OnTheMap [M et al ICDE ‘08] Summer @ Census, 8/15/201316

17 Outline Background – Differential privacy No Free Lunch [Kifer-M SIGMOD ’11] – No `one privacy notion to rule them all’ Pufferfish Privacy Framework [Kifer-M PODS’12] – Navigating the space of privacy definitions Blowfish: Practical privacyusing policies [ongoing work] Summer @ Census, 8/15/201317

18 Differential Privacy & Utility Differentially private mechanisms may not ensure sufficient utility for many applications. Sparse Data: Integrated Mean Square Error due to Laplace mechanism can be worse than returning a random contingency table for typical values of ε (around 1) Social Networks [M et al PVLDB 2011] Summer @ Census, 8/15/201318

19 Differential Privacy & Privacy Differentially private algorithms may not limit the ability of an adversary to learn sensitive information about individuals when records in the data are correlated Correlations across individuals occur in many ways: – Social Networks – Data with pre-released constraints – Functional Dependencies Summer @ Census, 8/15/201319

20 Laplace Mechanism and Correlations 20 2 + Lap(2/ε) 4 8 + Lap(2/ε)10 4 D Does Laplace mechanism still guarantee privacy? Auxiliary marginals published for following reasons: 1.Legal: 2002 Supreme Court case Utah v. Evans 2.Contractual: Advertisers must know exact demographics at coarse granularities 4 10 4 Summer @ Census, 8/15/2013

21 Laplace Mechanism and Correlations 21 2 + Lap(2/ε) 4 8 + Lap(2/ε)10 4 D 2 + Lap(2/ε) Count (, ) = 8 + Lap(2/ε) Count (, ) = 8 – Lap(2/ε) Count (, ) = 8 + Lap(2/ε) Summer @ Census, 8/15/2013

22 Mean : 8 Variance : 8/ke 2 Laplace Mechanism and Correlations 22 2 + Lap(1/ε) 4 8 + Lap(2/ε)10 4 D 2 + Lap(2/ε) can reconstruct the table with high precision for large k Summer @ Census, 8/15/2013

23 No Free Lunch Theorem It is not possible to guarantee any utility in addition to privacy, without making assumptions about the data generating distribution the background knowledge available to an adversary 23 [Kifer-M SIGMOD ‘11] Summer @ Census, 8/15/2013 [Dwork-Naor JPC ‘10]

24 To sum up … Differential privacy only captures a small part of the privacy-utility trade-off space – No Free Lunch Theorem – Differentially private mechanisms may not ensure sufficient privacy – Differentially private mechanisms may not ensure sufficient utility Summer @ Census, 8/15/201324

25 Outline Background – Differential privacy No Free Lunch [Kifer-M SIGMOD ’11] – No `one privacy notion to rule them all’ Pufferfish Privacy Framework [Kifer-M PODS’12] – Navigating the space of privacy definitions Blowfish: Practical privacyusing policies [ongoing work] Summer @ Census, 8/15/201325

26 Pufferfish Framework Summer @ Census, 8/15/201326

27 Pufferfish Semantics What is being kept secret? Who are the adversaries? How is information disclosure bounded? – (similar to epsilon in differential privacy) Summer @ Census, 8/15/201327

28 Sensitive Information Secrets: S be a set of potentially sensitive statements – “individual j’s record is in the data, and j has Cancer” – “individual j’s record is not in the data” Discriminative Pairs: Mutually exclusive pairs of secrets. – (“Bob is in the table”, “Bob is not in the table”) – (“Bob has cancer”, “Bob has diabetes”) Summer @ Census, 8/15/201328

29 Adversaries We assume a Bayesian adversary who is can be completely characterized by his/her prior information about the data – We do not assume computational limits Data Evolution Scenarios: set of all probability distributions that could have generated the data ( … think adversary’s prior). – No assumptions: All probability distributions over data instances are possible. – I.I.D.: Set of all f such that: P(data = {r 1, r 2, …, r k }) = f(r 1 ) x f(r 2 ) x…x f(r k ) Summer @ Census, 8/15/201329

30 Information Disclosure Mechanism M satisfies ε-Pufferfish(S, Spairs, D), if Summer @ Census, 8/15/201330

31 Pufferfish Semantic Guarantee Summer @ Census, 8/15/201331 Prior odds of s vs s’ Posterior odds of s vs s’

32 Applying Pufferfish to Differential Privacy Spairs: – “record j is in the table” vs “record j is not in the table” – “record j is in the table with value x” vs “record j is not in the table” Data evolution: – Probability record j is in the table: π j – Probability distribution over values of record j: f j – For all θ = [ f 1, f 2, f 3, …, f k, π 1, π 2, …, π k ] Summer @ Census, 8/15/201332

33 Applying Pufferfish to Differential Privacy Spairs: – “record j is in the table” vs “record j is not in the table” – “record j is in the table with value x” vs “record j is not in the table” Data evolution: – For all θ = [ f 1, f 2, f 3, …, f k, π 1, π 2, …, π k ] A mechanism M satisfies differential privacy if and only if it satisfies Pufferfish instantiated using Spairs and {θ} (as defined above) Summer @ Census, 8/15/201333

34 Pufferfish & Differential Privacy Spairs: – s i x : record i takes the value x – Attackers should not be able to significantly distinguish between any two values from the domain for any individual record. Summer @ Census, 8/15/201334

35 Pufferfish & Differential Privacy Data evolution: – For all θ = [ f 1, f 2, f 3, …, f k ] Adversary’s prior may be any distribution that makes records independent Summer @ Census, 8/15/201335

36 Pufferfish & Differential Privacy Spairs: – s i x : record i takes the value x – Data evolution: – For all θ = [ f 1, f 2, f 3, …, f k ] A mechanism M satisfies differential privacy if and only if it satisfies Pufferfish instantiated using Spairs and {θ} Summer @ Census, 8/15/201336

37 Summary of Pufferfish A semantic approach to defining privacy – Enumerates the information that is secret and the set of adversaries. – Bounds the odds ratio of pairs of mutually exclusive secrets Helps understand assumptions under which privacy is guaranteed – Differential privacy is one specific choice of secret pairs and adversaries How should a data publisher use this framework? Algorithms? Summer @ Census, 8/15/201337

38 Outline Background – Differential privacy No Free Lunch [Kifer-M SIGMOD ’11] – No `one privacy notion to rule them all’ Pufferfish Privacy Framework [Kifer-M PODS’12] – Navigating the space of privacy definitions Blowfish: Practical privacyusing policies [ongoing work] Summer @ Census, 8/15/201338

39 Blowfish Privacy A special class of Pufferfish instantiations Both pufferfish and blowfish are marine fish of the Tetraodontidae family Summer @ Census, 8/15/201339

40 Blowfish Privacy A special class of Pufferfish instantiations Extends differential privacy using policies – Specification of sensitive information Allows more utility – Specification of publicly known constraints in the data Ensures privacy in correlated data Satisfies the composition property Summer @ Census, 8/15/201340

41 Blowfish Privacy A special class of Pufferfish instantiations Extends differential privacy using policies – Specification of sensitive information Allows more utility – Specification of publicly known constraints in the data Ensures privacy in correlated data Satisfies the composition property Summer @ Census, 8/15/201341

42 Sensitive Information Secrets: S be a set of potentially sensitive statements – “individual j’s record is in the data, and j has Cancer” – “individual j’s record is not in the data” Discriminative Pairs: Mutually exclusive pairs of secrets. – (“Bob is in the table”, “Bob is not in the table”) – (“Bob has cancer”, “Bob has diabetes”) Summer @ Census, 8/15/201342

43 Sensitive information in Differential Privacy Spairs: – s i x : record i takes the value x – Attackers should not be able to significantly distinguish between any two values from the domain for any individual record. Summer @ Census, 8/15/201343

44 Other notions of Sensitive Information Medical Data – OK to infer whether individual is healthy or not. – E.g., (Bob is Healthy, Bob is Diabetes) is not a discriminative pair of secrets for any individual Partitioned Sensitive Information: Summer @ Census, 8/15/201344

45 Other notions of Sensitive Information Geospatial Data – Do not want the attacker to distinguish between “close-by” points in the space. – May distinguish between “far-away” points Distance based Sensitive Information Summer @ Census, 8/15/201345

46 Other notions of Sensitive Information Social Networks – Domain of individual’s record is the power set of V (nodes) Edge Privacy: Node Privacy: Summer @ Census, 8/15/201346

47 Generalization as a graph Consider a graph G = (V, E), where V is the set of values that an individual’s record can take. E encodes the set of discriminative pairs – Same for all records. Summer @ Census, 8/15/201347

48 Blowfish Privacy + “Policy of Secrets” A mechanism M satisfy blowfish privacy wrt policy G if – For every set of outputs of the mechanism S – For every pair of datasets that differ in one record, with values x and y s.t. (x,y) ε E Summer @ Census, 8/15/201348

49 Blowfish Privacy + “Policy of Secrets” A mechanism M satisfy blowfish privacy wrt policy G if – For every set of outputs of the mechanism S – For every pair of datasets that differ in one record, with values x and y s.t. (x,y) ε E For any x and y in the domain, Summer @ Census, 8/15/201349 Shortest distance between x and y in G

50 Blowfish Privacy + “Policy of Secrets” A mechanism M satisfy blowfish privacy wrt policy G if – For every set of outputs of the mechanism S – For every pair of datasets that differ in one record, with values x and y s.t. (x,y) ε E Adversary is allowed to distinguish between x and y that appear in different disconnected components in G Summer @ Census, 8/15/201350

51 Algorithm1: Randomized Response Perturb each record in the table using the following distribution Non-interactive mechanism Summer @ Census, 8/15/201351

52 Algorithms for Blowfish Consider an ordered 1-D attribute – Dom = {x 1,x 2,x 3,…,x d } – E.g., ranges of Age, Salary, etc. Suppose our policy is: Adversary should not distinguish whether an individual’s value is x j or x j+1. Summer @ Census, 8/15/201352 x1x1 x2x2 x3x3 xdxd

53 Algorithms for Blowfish Suppose we want to release histogram privately – Number of individuals in each age range Any differentially private algorithm also satisfies blowfish – Can use Laplace mechanism (with sensitivity 2) Summer @ Census, 8/15/201353 x1x1 x2x2 x3x3 xdxd C(x 1 )C(x 3 )C(x d )

54 Ordered Mechanism We can answer a different set of queries to get a different private estimator for the histogram. Summer @ Census, 8/15/201354 x1x1 x2x2 x3x3 xdxd C(x 1 )C(x 3 )C(x d ) S3 S2 S1 Sd …

55 Ordered Mechanism We can answer each Si using Laplace mechanism … … but sensitivity for all the queries is only 1 Summer @ Census, 8/15/201355 x1x1 x2x2 x3x3 xdxd C(x 3 ) +1 S3 S2 S1 Sd … C(x 2 ) -1 Changing one tuple from x2 to x3 only changes S2

56 Ordered Mechanism We can answer each Si using Laplace mechanism … … but sensitivity for all the queries is only 1 Summer @ Census, 8/15/201356 Factor of 2 improvement

57 Ordered Mechanism In addition, we have the following constraint: However, the noisy counts may not satisfy this constraint. We can post-process the noisy counts to ensure this constraint: Summer @ Census, 8/15/201357

58 Ordered Mechanism We can post-process the noisy counts to ensure this constraint: Summer @ Census, 8/15/201358 Order of magnitude improvement for large d

59 Ordered Mechanism By leveraging the weaker sensitive information in the policy, we can provide significantly better utility Extends to more general policy specifications. Ordered mechanisms and other blowfish algorithms are being tested on the synthetic data generator for LODES data product. Summer @ Census, 8/15/201359

60 Blowfish Privacy & Correlations Differentially private mechanisms may not ensure privacy when correlations exist in the data. Blowfish can handle constraints in the form of publicly known constraints. – Well know marginal counts in the data – Other dependencies Privacy definition is similar to differential privacy with a modified notion of neighboring tables Summer @ Census, 8/15/201360

61 Other instantiations of Pufferfish All blowfish instantiations are extensions of differential privacy using – Weaker notions of sensitive information – Allowing knowledge of constraints about the data – All blowfish mechanisms satisfy composition property We can instantiate Pufferfish with other “realistic” adversary notions – Only prior distributions that are similar to the expected data distribution – Open question: Which definitions satisfy composition property? Summer @ Census, 8/15/201361

62 Summary Differential privacy (and the tuning knob epsilon) is insufficient for trading off privacy for utility in many applications – Sparse data, Social networks, … Pufferfish framework allows more expressive privacy definitions – Can vary sensitive information, adversary priors, and epsilon Blowfish shows one way to create more expressive definitions – Can provide useful composable mechanisms There is an opportunity to correctly tune privacy by using the above expressive privacy frameworks Summer @ Census, 8/15/201362

63 Thank you [M et al PVLDB’11] A. Machanavajjhala, A. Korolova, A. Das Sarma, “Personalized Social Recommendations – Accurate or Private?”, PVLDB 4(7) 2011 [Kifer-M SIGMOD’11] D. Kifer, A. Machanavajjhala, “No Free Lunch in Data Privacy”, SIGMOD 2011 [Kifer-M PODS’12] D. Kifer, A. Machanavajjhala, “A Rigorous and Customizable Framework for Privacy”, PODS 2012 [ongoing work] A. Machanavajjhala, B. Ding, X. He, “Blowfish Privacy: Tuning Privacy-Utility Trade-offs using Policies”, in preparation Summer @ Census, 8/15/201363


Download ppt "Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala cs.duke.edu Collaborators: Daniel Kifer (PSU),"

Similar presentations


Ads by Google