Presentation is loading. Please wait.

Presentation is loading. Please wait.

Privacy in Database Publishing

Similar presentations


Presentation on theme: "Privacy in Database Publishing"— Presentation transcript:

1 Privacy in Database Publishing
a presentation by ? Avinash Vyas Yannis Katsis Indian PhD Student ? Greek PhD Student Database 02/24/2006

2 Outline Defining Privacy Optimization Problem
First-Cut Solution (k-anonymity) Second-Cut Solution (l-diversity) Decision Problem First-Cut (Query-View Security) Second-Cut (View Safety)

3 Defining Privacy in DB Publishing
Privacy in this talk, IS NOT traditional security of data e.g. hacking, access control, theft of disk etc. NO FOUL PLAY

4 Defining Privacy in DB Publishing
Privacy in this talk IS logical security of data If the attacker uses legitimate methods, - can she infer the data I want to keep private? - how can I keep some data private while publishing useful info? Decision Problem Optimization Problem Attacker External Knowledge Modify Data V1 V2 Alice Secret

5 Outline Defining Privacy Optimization Problem
First-Cut Solution (k-anonymity) Second-Cut Solution (l-diversity) Decision Problem First-Cut (Query-View Security) Second-Cut (View Safety)

6 Need for Privacy in DB publishing
Alice is a owner of person-specific data Public health agency, Telecom provider, Financial Organization The person-specific data contains Attribute values which can uniquely identify an individual { zip-code, gender, date-of-birth } or/and {name} or/and {SSN} sensitive information corresponding to individuals medical condition, salary, location Great demand for sharing of person-specific data Medical research, new telecom applications Alice wants to publish this person-specific data s.t. Information remains practically useful Identity of the individual cannot be determined Modify Data

7  Motivating Example  The Optimization Problem
Secret: Alice wants to publish hospital data, while the correspondence between name & disease stays private Non-Sensitive Data Sensitive Data # Zip Age Nationality Name Condition 1 13053 28 Brazilian Ronaldo Heart Disease 2 13067 29 US Bob 3 37 Indian Kumar Cancer 4 36 Japanese Umeko Modify Data

8 Motivating Example (continued)
The Optimization Problem Motivating Example (continued) Published Data: Alice publishes data without the Name Non-Sensitive Data Sensitive Data # Zip Age Nationality Condition 1 13053 28 Brazilian Heart Disease 2 13067 29 US 3 37 Indian Cancer 4 36 Japanese Modify Data Attacker’s Knowledge: Voter registration list Chris Bob Paul John Name US 23 13067 4 29 3 22 2 45 1 Nationality Age Zip # In general the attacker can use “external knowledge” which could be some prior knowledge instead of some other data source

9 Motivating Example (continued)
The Optimization Problem Motivating Example (continued) Published Data: Alice publishes data without the Name Non-Sensitive Data Sensitive Data # Zip Age Nationality Condition 1 13053 28 Brazilian Heart Disease 2 13067 29 US 3 37 Indian Cancer 4 36 Japanese Modify Data Attacker’s Knowledge: Voter registration list We assume that the attacker knows that Bob’s medical data are in the hospital database # Name Zip Age Nationality 1 John 13067 45 US 2 Paul 22 3 Bob 29 4 Chris 23 Data Leak !

10 Source of the Problem The Optimization Problem
Even if we do not publish the individuals: • There are some fields that may uniquely identify some individual Non-Sensitive Data Sensitive Data # Zip Age Nationality Condition Quasi Identifier Make clear that the quasi identifier might not be a PK – it is used for joining • The attacker can use them to join with other sources and identify the individuals

11 Outline Defining Privacy Optimization Problem
First-Cut Solution (k-anonymity) Second-Cut Solution (l-diversity) Decision Problem First-Cut (Query-View Security) Second-Cut (View Safety)

12  First-Cut Solution: k-Anonymity The Optimization Problem
L. Sweeney: Achieving k-Anonymity Privacy Protection Using Generalization and Suppression Instead of returning the original data: • Change the data such that for each tuple in the results there are at least k-1 other tuples with the same value for the quasi-identifier e.g. # Zip Age Nationality Condition 1 13053 28 Brazilian Heart Disease 2 13067 29 US 3 37 Indian Cancer 4 36 Japanese 4-anonymous Table Cancer * < 40 130** 4 3 Heart Disease 2 1 Condition Nationality Age Zip # # Zip Age Nationality Condition 1 13053 < 40 * Heart Disease 2 13067 3 Cancer 4 Original Table 2-anonymous Table

13 Generalization & Suppression
The Optimization Problem > k-Anonymity Generalization & Suppression Different ways of modifying data: • Randomization • Data-Swapping • Generalization Replace the value with a less specific but semantically consistent value • Suppression Do not release a value at all Modify Data Mention advantages of generalization & suppression # Zip Age Nationality Condition 1 13053 < 40 * Heart Disease 2 13067 3 Cancer 4

14 Generalization Hierarchies
The Optimization Problem > k-Anonymity Generalization Hierarchies • Generalization Hierarchies: Data owner defines how values can be generalized 3 Zip  Age * Nationality 130 < 40 2 Cancer * < 40 13067 4 13053 3 Heart Disease 2 1 Condition Nationality Age Zip # Cancer * < 40 130** 4 3 Heart Disease 2 1 Condition Nationality Age Zip # Cancer Asian 3* 130** 4 3 Heart Disease American < 30 2 1 Condition Nationality Age Zip # * 1 1305 1306 < 30 3* American Asian 13053 13058 13063 13067 28 29 36 37 Brazilian US Indian Japanese • Table Generalization: A table generalization is created by generalizing all values in a column to a specific level of generalization e.g. 2-anonymization

15 k-minimal Generalizations
The Optimization Problem > k-Anonymity k-minimal Generalizations • There are many k-anonymizations. Which to pick? The ones that do not generalize the data more than needed k-minimal Generalization: A k-anonymization that is not a generalization of another k-anonymization e.g. 2-minimal Generalization 2-minimal Generalization # Zip Age Nationality Condition 1 13053 < 40 * Heart Disease 2 13067 3 Cancer 4 # Zip Age Nationality Condition 1 130** < 30 American Heart Disease 2 3 3* Asian Cancer 4 Define what it means for a table to be a generalization of another table # Zip Age Nationality Condition 1 130** < 40 * Heart Disease 2 3 Cancer 4 Non-minimal 2-anonymization

16 k-minimal Distortions
The Optimization Problem > k-Anonymity k-minimal Distortions • There are many k-minimal generalizations. Which to pick? The ones the create the minimum distortion to the data k-minimal Distortion: A k-minimal generalization that has the least distortion Current level of generalization for attribute i Max level of generalization for attribute i attrib i Distortion D = Number of attributes # Zip Age Nationality Condition 1 13053 < 40 * Heart Disease 2 13067 3 Cancer 4 If there are many k-minimal distortions, we can pick one according to the user preferences e.g. # Zip Age Nationality Condition 1 130** < 30 American Heart Disease 2 3 3* Asian Cancer 4 2 2 2 1 1 D = ( + + ) / 3 = 0.56 D = ( + + ) / 3 = 0.5 3 3 2 3 3 2

17 Complexity & Algorithms
The Optimization Problem > k-Anonymity Complexity & Algorithms Search Space: • Number of generalizations = (Max level of generalization for attribute i + 1) attrib i If we allow generalization to a different level for each value of an attribute: • Number of generalizations = #tuples (Max level of generalization for attribute i + 1) attrib i Problem is NP-hard! See paper for: • Naïve Brute force algorithm • Heuristics: Datafly,  - Argus

18 k-Anonymity Drawbacks
The Optimization Problem > k-Anonymity k-Anonymity Drawbacks k-Anonymity alone does not provide privacy if: • Sensitive attributes lack diversity • Attacker has background knowledge

19 k-Anonymity Attack Example
The Optimization Problem > k-Anonymity k-Anonymity Attack Example Original Data Quasi-Identifier Sensitive Data # ZIP Age Nationality Condition 1 13053 28 Russian Heart Disease 2 13068 29 American 3 21 Japanese Viral Infection 4 23 5 14853 50 Indian Cancer 6 55 7 14850 47 8 49 9 31 10 37 11 36 12 35 The attacker knows: • About quasi-identifiers: Umeko Zip Age National 13068 21 Japanese Bob Zip Age National 13053 31 American • Other background knowledge: Japanese have low incidence of heart disease

20 k-Anonymity Attack Example
The Optimization Problem > k-Anonymity k-Anonymity Attack Example 4-anonymization Quasi-Identifiers Sensitive Data # ZIP Age Nationality Condition 1 130** < 30 * Heart Disease 2 3 Viral Infection 4 5 1485* > = 40 Cancer 6 7 8 9 3* 10 11 12 Umeko Zip Age National 13068 21 Japanese Umeko has Viral Infection! Data Leak ! Bob Zip Age National 13053 31 American Bob has Cancer!

21 Outline Defining Privacy Optimization Problem
First-Cut Solution (k-anonymity) Second-Cut Solution (l-diversity) Decision Problem First-Cut (Query-View Security) Second-Cut (View Safety)

22 Second-Cut Solution: l-Diversity
The Optimization Problem Second-Cut Solution: l-Diversity A. Machanavajjhala et. al.: l-Diversity: Privacy Beyond k-Anonymity Return a k-anonymization with the additional property that: • For each distinct value of the quasi-identifier there exist l different values for the sensitive attributes

23 l-Diversity Example The Optimization Problem > l-Diversity
3-diversified Quasi-Identifiers Sensitive Data # ZIP Age Nationality Condition 1 1305* <= 40 * Heart Disease 2 1306* 3 Viral Infection 4 5 1485* >= 40 Cancer 6 7 8 9 10 11 12 Attack does not work! Umeko Zip Age National 13068 21 Japanese Umeko has Viral Infection or Cancer Bob Zip Age National 13053 31 American Bob has Viral Infection or Cancer or Heart Disease

24 Outline Defining Privacy Optimization Problem
First-Cut Solution (k-anonymity) Second-Cut Solution (l-diversity) Decision Problem First-Cut (Query-View Security) Second-Cut (View Safety)

25 Moving from practice to theory…
The Decision Problem Moving from practice to theory… Gerome Miklau, Dan Suciu: A formal Analysis of Information Disclosure in Data Exchange k-anonymity & l-diversity make it harder for the attacker to figure out private associations… … but they still give away some knowledge & they do not give any guarantees on the amount of data being disclosed Alice wants to publish some views of her data and wants to know: Do her views disclose some sensitive data? If she adds a new view, will there be an additional data disclosure? Views V1 V2

26  Motivating Example  The Decision Problem
Secret: Alice wants to keep the correlation between Name & Condition secret S = (name, condition) # Zip Name Condition 1 13053 Ronaldo Heart Disease 2 13067 Bob 3 Kumar Viral Infection 4 Umeko Cancer V1 V2 Published Views: Alice publishes the views V1 = (zip, name) V2 = (zip, condition) Zip Name 13053 Ronaldo 13067 Bob Kumar Umeko Zip Condition 13053 Heart Disease 13067 Viral Infection Cancer

27  Motivating Example  The Decision Problem Attackers Knowledge:
Before seeing the views: (assuming he knows the domain) Condition Heart Disease Viral Infection Cancer Ronaldo V1 V2 After seeing the views: We have moved from bag semantics to set semantics Viral Infection Heart Disease Condition Ronaldo Data Leak ! V1 V2 Zip Name 13053 Ronaldo Zip Condition 13053 Heart Disease 13067 Viral Infection Cancer 

28 Does the Views reveal any information to Attacker about the Secret?
The Decision Problem The Problem Alice has a database D which conforms to schema S. Alice publishes a set of Views V (defined by query) over D. Alice wants to protect some sensitive data Identified by a secret query q over D. Attacker wants to guess Alice’s secret she can ask queries against published views V. she cannot ask the secret query q directly. she knows the Schema S, the Views V and the Secret query q Does the Views reveal any information to Attacker about the Secret? What class of queries do we have?

29 Probability of possible tuples
The Decision Problem > Model for attacker’s knowledge Probability of possible tuples Domain of possible values for all attributes: D = {Bob, Mary} Set of possible tuples of relation R (e.g. cooksFor): Attacker assigns a probability to each tuple x1 = 1/2 x2 = 1/2 x3 = 1/2 x4 = 1/2 Bob Bob Mary Mary Bob Mary

30 Probability of possible Databases
The Decision Problem > Model for attacker’s knowledge Probability of possible Databases This implies a probability for each possible database instance: Bob Mary x1 = 1/2 1 – x2 = 1/2  = 1/16 1 – x4 = 1/2 1 – x3 = 1/2 Bob Mary 1 – x1 = 1/2 16 possible instances x2 = 1/2  = 1/16 1 – x4 = 1/2 Tuples are not correlated 1 – x3 = 1/2 Bob Mary x1 = 1/2 x2 = 1/2  = 1/16 1 – x4 = 1/2 1 – x3 = 1/2

31 Probability of possible Secrets
The Decision Problem > Model for attacker’s knowledge Probability of possible Secrets This implies a probability for each possible secret value: Probability that secret S(y) :- R(x,y) equals s = {(Bob)} Bob Mary Bob Mary Bob Mary Sum of probabilities of instances that can return this query result 3 P[S(I) = s]= 16 Similarly for probability that view V equals v: P(V(I) = v)

32 Prior & Posterior Probability
The Decision Problem > Model for attacker’s knowledge Prior & Posterior Probability Prior Probability: Probability before seeing the view instance 3 P[S(I) = {(Bob)}]= Secret S(y) :- R(x,y) 16 Posterior Probability: Probability after seeing the view instance If V(I) = {(Mary)} View V(x) :- R(x,y) P[S(I) = {(Bob)} | V(I) = {(Mary)}]= P[S(I) = {(Bob)} AND V(I) = {(Mary)}] Mary Bob 1/16 = P[S(I) = {V(I) = {(Mary)}] Mary Bob 3/16 Mary Mary Bob

33 Query-View Security The Decision Problem
A query S is secure w.r.t. a set of views V if for any possible answer s to S & for any possible answer v to V: P[S(I) = s] = P[S(I) = s | V(I) = v] Prior Probability Posterior Probability Intuitively, if some possible answer to S becomes more or less possible after publishing the views V, then S is not secure w.r.t. V

34 From Probabilities to Logic
The Decision Problem From Probabilities to Logic A possible tuple t is a critical tuple if for some possible instance I: Q[I]  Q[I – {t}] Query result in presence of t Query result in absence of t Intuitively, critical tuples are those of interest to the query A query S is secure w.r.t. a set of views V iff: crit(S)  crit(V) =  The probability distribution does not affect the security of a query

35 Example of Non-Secure Query
The Decision Problem Example of Non-Secure Query Previous Example Revisited: Secret S(y) :- R(x,y) Non-Secure Query S View V(x) :- R(x,y) Critical Tuples for S: crit(S) Critical Tuples for V: crit(V) Bob Bob Bob Mary Bob Mary   Mary Bob Mary Bob Mary Mary e.g. S({(Mary,Mary)}  S{}

36 Example of Secure Query
The Decision Problem Example of Secure Query Example 2: Secret S(x) :- R(x,’Mary’) Secure Query S View V(x) :- R(x,’Bob’) Critical Tuples for S: crit(S) Critical Tuples for V: crit(V) Bob Bob Bob Mary Bob Mary =  Mary Bob Mary Bob Mary Mary

37 Example of Secure Query
The Decision Problem Example of Secure Query Example of Secure Query Example 2 revisited using probabilistic definition of security: Secret S(x) :- R(x,’Mary’) Secure Query S View V(x) :- R(x,’Bob’) P[S(I) = {(Mary)] = 4/16 = P[S(I) = {(Mary)} | V(I) = {(Bob)}] = 1/4 Bob Mary Bob Mary Bob Mary Bob Mary Bob Mary Bob Mary Bob Mary Bob Mary

38 Properties of Query-View Security
The Decision Problem Properties of Query-View Security Reflexivity Is S is secure w.r.t. V, V is secure w.r.t. S No obscurity view definitions, secret query and schema are not concealed Instance Independence If S is secure w.r.t. V even if the underlying database changes Probability Distribution Independence If S and V are monotone queries Domain Independence If S is secure w.r.t V for a domain D0 such that |D0| <= n(n+1), then S is secure w.r.t. V for all Domains D where |D0| <= n(n+1) Complexity of query-view security P2 - complete

39 P[S(I) = s | K(I)]  P[S(I)=s | V(I) = v  K(I)]
The Decision Problem Prior Knowledge Prior knowledge other than domain D and probability distribution P e.g. key or foreign key constraint or Represented as a Boolean query K over the instance Query view security P[S(I) = s | K(I)]  P[S(I)=s | V(I) = v  K(I)]

40 Leak(S,V)  sup ( P[sS(I) | v V(I)] - P[s S(I)] ) / P[sS(I)]
The Decision Problem Measuring Disclosure The query-view security is very strong rules out most of the views in practical usage as insecure The applications are ready to tolerate some disclosures Disclosure examples: Positive disclosure “Bob” has “Cancer” Negative disclosure “Umeko” does not have “Heart Disease” Measure of Positive disclosure: Leak(S,V)  sup ( P[sS(I) | v V(I)] - P[s S(I)] ) / P[sS(I)] Disclosure is minute if: leak(S,V) << 1 s, v

41 Query-View Security Drawbacks
The Decision Problem Query-View Security Drawbacks Tuples are modeled as mutually independent This is not the case in presence of constraints (e.g foreign key constraints) Modeling prior or external knowledge Boolean predicate does not suffice Conjunctive queries only is restrictive Guarantees are instance-independent There may not be a privacy breach given the current instance

42 Outline Defining Privacy Optimization Problem
First-Cut Solution (k-anonymity) Second-Cut Solution (l-diversity) Decision Problem First-Cut (Query-View Security) Second-Cut (View Safety)

43  More general setting The Decision Problem
Alin Deutsch, Yannis Papakonstantinou: Privacy in Database Publishing Alice has a database D which conforms to schema S. D satisfies a set of constraints . V is a set of views over D. Model attacker’s belief as probability distribution Views and queries are defined using UCQ Alice wants to publish an additional view N. Does view N provide any new information to the Attacker about the answer to query Q?

44 Motivating Example (w/o Constraints)
The Decision Problem Motivating Example (w/o Constraints) Secret: Alice wants to hide the reviewer of paper P1 S(r) :- RP(r, ‘P1’) Reviewer Committee R1 C1 R2 R3 C2 R4 C3 Committee Paper C1 P1 P2 C2 P3 C3 P4 Reviewer Paper R1 P1 R2 P2 R3 P3 R4 P4 V1 V2 Published Views: New Additional Views: V1(r) :- RC(r, c) V2(c) :- RC(r, c) N1(r, c) :- RC(r, c) N2(c, p) :- CP(c, p) Reviewer R1 R2 R3 R4 Committee C1 C2 C3 Rev. Commit. R1 C1 R2 R3 C2 R4 C3 Commit. Paper C1 P1 P2 C2 P3 C3 P4 New views reveal nothing about the secret

45 Motivating Example (with Constraint 1)
The Decision Problem Motivating Example (with Constraint 1) Published Views: New Additional Views: V1(r) :- RC(r, c) V2(c) :- RC(r, c) N1(r, c) :- RC(r, c) N2(c, p) :- CP(c, p) Data disclosure depends on the constraints Reviewer R1 R2 R3 R4 Committee C1 C2 C3 Rev. Commit. R1 C1 R2 R3 C2 R4 C3 Commit. Paper C1 P1 P2 C2 P3 C3 P4 Constraint 1: Papers assigned to a committee can only be reviewed by committee members rp RP(r,p)  c RC(r,c)CP(c,p) Possible secrets with new views: R1 R2 R1 R2

46 Motivating Example (with Constraint 2)
The Decision Problem Motivating Example (with Constraint 2) Published Views: New Additional Views: V1(r) :- RC(r, c) V2(c) :- RC(r, c) N1(r, c) :- RC(r, c) N2(c, p) :- CP(c, p) Data disclosure depends on the constraints Reviewer R1 R2 R3 R4 Committee C1 C2 C3 Rev. Commit. R1 C1 R2 R3 C2 R4 C3 Commit. Paper C1 P1 P2 C2 P3 C3 P4 Constraint 1: Papers assigned to a committee can only be reviewed by committee members Constraint 2: Each paper has exactly 2 reviewers Possible secrets with new views: R1 R2

47 Motivating Example (different instance)
The Decision Problem Motivating Example (different instance) Published Views: New Additional Views: V1(r) :- RC(r, c) V2(c) :- RC(r, c) N1(r, c) :- RC(r, c) N2(c, p) :- CP(c, p) Data disclosure depends on the instance Reviewer R1 R2 R3 R4 Committee C0 Rev. Commit. R1 C0 R2 R3 R4 Commit. Paper C0 P1 P2 P3 P4 Constraint 1: Papers assigned to a committee can only be reviewed by committee members New views reveal nothing about the secret, since any subset of the reviewers in V1 may review paper ‘P1’

48 Probabilities Revisited: Plausible Secrets
The Decision Problem Probabilities Revisited: Plausible Secrets In order to allow correlation of tuples, the attacker assigns probabilities to the plausible secrets (outcomes for query S that are possible given the published views) e.g. in previous example with constraint 1 & secret S(r) :- RP(r, ‘P1’) Published Views: Plausible Secrets: V1(r) :- RC(r, c) V2(c) :- RC(r, c) Any subset of V1 Reviewer R1 R2 R3 R4 Committee C1 C2 C3 e.g. R1 P1 = 3/8 R2 P2 = 1/8 R3 P3 = 2/8 R1 R2 P4 = 2/8 Pi = 0, i > 4

49 Possible Worlds The Decision Problem
This induces a probability distribution on the set of possible worlds (possible instances that satisfy the constraints & the published views) Possible Worlds where S = {(R1)}: Rev Com R1 C1 R2 R3 C2 R4 C3 Com Pap C1 P1 P2 C2 P3 C3 P4 Rev Pap R1 P1 R2 P2 R3 P3 R4 P4 Published Views: Plausible Secrets: PG1 V1(r) :- RC(r, c) V2(c) :- RC(r, c) Any subset of V1 Reviewer R1 R2 R3 R4 Committee C1 C2 C3 e.g. R1 P1 = 3/8 Rev Com R1 C1 R2 R3 R4 C3 Com Pap C1 P1 P2 C2 P3 C3 P4 Rev Pap R1 P1 R2 P2 R3 P3 R4 P4 R2 P2 = 1/8 R3 P3 = 2/8 PG2 R1 R2 P4 = 2/8 Pi = 0, i > 4

50 Probability Distribution on Possible Worlds
The Decision Problem Probability Distribution on Possible Worlds This induced probability distribution can be: General: Sum of probabilities of possible worlds for any secret value s is equal to the probability of S = s Possible Worlds if S = {(R1)}: Rev Com R1 C1 R2 R3 C2 R4 C3 Com Pap C1 P1 P2 C2 P3 C3 P4 Rev Pap R1 P1 R2 P2 R3 P3 R4 P4 Published Views: Plausible Secrets: PG1 V1(r) :- RC(r, c) V2(c) :- RC(r, c) Any subset of V1 Reviewer R1 R2 R3 R4 Committee C1 C2 C3 P1 = 3/8 e.g. R1 P1 = 3/8 + Rev Com R1 C1 R2 R3 R4 C3 Com Pap C1 P1 P2 C2 P3 C3 P4 Rev Pap R1 P1 R2 P2 R3 P3 R4 P4 R2 P2 = 1/8 R3 P3 = 2/8 PG2 R1 R2 P4 = 2/8 + Pi = 0, i > 4

51 Probability Distribution on Possible Worlds
The Decision Problem Probability Distribution on Possible Worlds This induced probability distribution can be: Equiprobable: Each of the possible worlds for any secret value s is equally probable (i.e. equal to the probability of S = s / # of possible worlds for s) Possible Worlds if S = {(R1)}: Rev Com R1 C1 R2 R3 C2 R4 C3 Com Pap C1 P1 P2 C2 P3 C3 P4 Rev Pap R1 P1 R2 P2 R3 P3 R4 P4 Published Views: Plausible Secrets: PG1 V1(r) :- RC(r, c) V2(c) :- RC(r, c) Any subset of V1 Reviewer R1 R2 R3 R4 Committee C1 C2 C3 R1 P1 = 3/8 = Rev Com R1 C1 R2 R3 R4 C3 Com Pap C1 P1 P2 C2 P3 C3 P4 Rev Pap R1 P1 R2 P2 R3 P3 R4 P4 R2 P2 = 1/8 R3 P3 = 2/8 PG2 R1 R2 P4 = 2/8 = Pi = 0, i > 4

52 A priori & a posteriori belief
The Decision Problem A priori & a posteriori belief A priori belief: The belief of the attacker in S = s before seeing the new views Sum of probabilities of possible worlds for S = s PG(s = S | V = v) = Sum of probabilities of all possible worlds A posteriori belief: The belief of the attacker in S = s after seeing the new views Sum of probabilities of possible worlds for S = s PG(s = S | V = v ^ N = n) = Sum of probabilities of all possible worlds Notice that the possible worlds will typically change after publishing views

53 Privacy Guarantees The Decision Problem
A set of new view instances n is safe w.r.t. query S & initial set of view instances v if for any plausible secret s: P[S = s | V = v] = P[S = s | V = v ^ N = n] A Priori Probability A Posteriori Probability We can also have database instance independent guarantees if we quantify the guarantee over all instances over the proprietary database

54 Example of View Safety  The Decision Problem
Paper example revisited…. Published Views: New Additional Views: V1: RC(r, c) V2: RC(r, c) N1: RC(r, c) N2: CP(c, p) Reviewer R1 R2 R3 R4 Committee C1 C2 C3 Rev. Commit. R1 C1 R2 R3 C2 R4 C3 Commit. Paper C1 P1 P2 C2 P3 C3 P4 Plausible Secrets: Plausible Secrets: Any subset of V1 Only the following 3: e.g. R1 P1 = 3/8 e.g. R1 P1’ R2 P2 = 1/8 R2 P2’ R3 P3 = 2/8 R3 P3 = 2/8 R1 R2 R1 R2 P4 = 2/8 P4’ P[S = {(R3)} | V = v] = 1/8 P[S = {(R3)} | V = v ^ N = n] = 0

55 View Safety for General Distributions
The Decision Problem View Safety for General Distributions For general induced distributions: P[S = s | V = v] = P[S = s | V = v ^ N = n] A Priori Probability A Posteriori Probability iff Set of possible worlds before seeing N Set of possible worlds after seeing N = Infinite Number! How to compute? iff Set of templates of possible worlds before seeing N Set of templates of possible worlds after seeing N =

56 Templates The Decision Problem
Templates are a finite summarization of a set of possible worlds: e.g. Schema View Extent V(A, C) :- R(A, B, C) A B C A C a1 c1 a2 c2 Templates A B C a1 x1 c1 a2 x2 c2 A B C a1 x3 c1 a2 c2

57 View Safety for Equiprobable Distributions
The Decision Problem View Safety for Equiprobable Distributions For equiprobable distributions, the set of possible worlds may change but the probability may not - Before the new views: There are 200 possible worlds in total 100 of them for S = s1 100 of them for S = s2 - After the new views: There are 100 possible worlds in total 50 of them for S = s1 50 of them for S = s2 Since every possible world that we discarded from S = s1 had the same probability & similarly for S = s2, what counts is the ratio between # Possible Worlds for S = s1 / # Possible Worlds for S = s2 which stayed the same e.g.

58 View Safety for Equiprobable Distributions
The Decision Problem View Safety for Equiprobable Distributions For equiprobable distributions: P[S = s | V = v] = P[S = s | V = v ^ N = n] A Priori Probability A Posteriori Probability iff Set of plausible secrets before seeing N Set of plausible secrets after seeing N = AND for all plausible secrets s1, s2 # possible worlds for S = s1 / # possible worlds for S = s2 before seeing N # possible worlds for S = s1 / # possible worlds for S = s2 after seeing N = Infinite Number! How to compute? > Templates

59 Summary Models for information disclosure Probabilistic Model
K-anonymity Probabilistic Tension between usability and anonymity Optimal or Minimized Suppression Generalization Probabilistic Model very strong guarantees Probability distribution and Instance independence Reduced into logical statement Database Templates

60 Thanks


Download ppt "Privacy in Database Publishing"

Similar presentations


Ads by Google