Presentation is loading. Please wait.

Presentation is loading. Please wait.

ACHIEVING k-ANONYMITY PRIVACY PROTECTION USING GENERALIZATION AND SUPPRESSION International Journal on Uncertainty, Fuzziness and Knowledge-based Systems,

Similar presentations


Presentation on theme: "ACHIEVING k-ANONYMITY PRIVACY PROTECTION USING GENERALIZATION AND SUPPRESSION International Journal on Uncertainty, Fuzziness and Knowledge-based Systems,"— Presentation transcript:

1 ACHIEVING k-ANONYMITY PRIVACY PROTECTION USING GENERALIZATION AND SUPPRESSION
International Journal on Uncertainty, Fuzziness and Knowledge-based Systems, 2002 Cited: 445 Liyan Zhang for CS 295

2 Outline Need for Privacy K-anonymity privacy protection
Methods for K-anonymity privacy protection Generalization including suppression Minimal generalization of a table Minimal distortion of a table Algorithm for finding a minimal generalization with minimal distortion Real-world results Datafly Systems u-Argus System

3 Need for Privacy Suppose that a medical institution, public health agency, or financial organization wants to publish person-specific records They want to publish such that: Information remains practically useful Identity of an individual cannot be determined Adversary might infer the secret/sensitive data from the published database

4 Need for Privacy The data contains:
Attribute values which can uniquely identify an individual { zip-code, nationality, age } or/and {name} or/and {SSN} sensitive information corresponding to individuals { medical condition, salary, location } Non-Sensitive Data Sensitive Data # Zip Age Nationality Name Condition 1 13053 28 Indian Kumar Heart Disease 2 13067 29 American Bob 3 35 Canadian Ivan Viral Infection 4 36 Japanese Umeko Cancer

5 Need for Privacy Published Data Voter List 1 13053 28 Indian
Non-Sensitive Data Sensitive Data # Zip Age Nationality Condition 1 13053 28 Indian Heart Disease 2 13067 29 American 3 35 Canadian Viral Infection 4 36 Japanese Cancer Published Data Data leak! # Name Zip Age Nationality 1 John 13053 28 American 2 Bob 13067 29 3 Chris 23 Voter List

6 Outline Need for Privacy K-anonymity privacy protection
Methods for K-anonymity privacy protection Generalization including suppression Minimal generalization of a table Minimal distortion of a table Algorithm for finding a minimal generalization with minimal distortion Real-world results Datafly Systems u-Argus System

7 K-anonymity privacy protection
Even if we remove the direct uniquely identifying attributes There are some fields that may still uniquely identify some individual! The attacker can join them with other sources and identify individuals Non-Sensitive Data Sensitive Data # Zip Age Nationality Condition Quasi-Identifiers

8 K-anonymity privacy protection
Attributes in the private information that could be used for linking with external information are termed the quasi-identifier. quasi-identifier explicit identifiers such as name, address, and phone number, attributes that in combination can uniquely identify individuals such as birth date, ZIP, and gender.

9 K-anonymity privacy protection
Our goal: Protect people’s privacy, when releasing person-specific information Limit the ability of using the quasi-identifier to link other external information K-anonymity table Change data in such a way that for each tuple in the resulting table there are at least (k-1) other tuples with the same value for the quasi-identifier If a table is k-anonymity, then each sequence of values in quasi-identifier appears at least k times.

10 Outline Need for Privacy K-anonymity privacy protection
Methods for K-anonymity privacy protection Generalization including suppression Minimal generalization of a table Minimal distortion of a table Algorithm for finding a minimal generalization with minimal distortion Real-world results Datafly Systems u-Argus System

11 Methods for K-anonymity privacy protection --Generalization including suppression
Replace the original value by a semantically consistent but less specific value Suppression Data not released at all Can be Cell-Level or (more commonly) Tuple-Level # Zip Age Nationality Condition 1 130** < 40 * Heart Disease 2 3 Viral Infection 4 Cancer Generalization Suppression (cell-level)

12 Use Generalization Hierarchies to create a table generalization
ZIP  Age * Nationality * 130 < 40 1305 1306 < 30 3* American Asian 13053 13058 13063 13067 28 29 36 35 Canadian US Indian Japanese • Generalization Hierarchies: Data owner defines how values can be generalized • Table Generalization: A table generalization is created by generalizing all values in a column to a specific level of generalization

13 Methods for K-anonymity privacy protection --K-Minimal generalizations of a table
There are many k-anonymizations – which one to pick? Intuition: The one that does not generalize the data more than needed (decrease in utility of the published dataset!) K-minimal generalization: A k-anonymized table that is not a generalization of another k-anonymized table

14 2-minimal Generalizations NOT a 2-minimal Generalization 1 13053
# Zip Age Nationality Condition 1 13053 < 40 * Heart Disease 2 Viral Infection 3 13067 4 Cancer 2-minimal Generalizations # Zip Age Nationality Condition 1 130** < 30 American Heart Disease 2 Viral Infection 3 3* Asian 4 Cancer # Zip Age Nationality Condition 1 130** < 40 * Heart Disease 2 Viral Infection 3 4 Cancer NOT a 2-minimal Generalization

15 Methods for K-anonymity privacy protection --k-minimal distortion table
Now, there are many k-minimal generalizations! – which one is preferred then? It can be The one that creates min. distortion to data Distortion measures by the ratio of the domain of the value found in the cell to the height of the attribute’s hierarchy Current level of generalization for attribute i attrib i Max level of generalization for attribute i D = Number of attributes

16 2-min

17 Algorithm for finding a minimal generalization with minimal distortion

18

19 Outline Need for Privacy K-anonymity privacy protection
Methods for K-anonymity privacy protection Generalization including suppression Minimal generalization of a table Minimal distortion of a table Algorithm for finding a minimal generalization with minimal distortion Real-world results Datafly Systems u-Argus System

20 Real-world results--Datafly Systems
Datafly system: The data holder declares specific attributes and tuples in the original private table (PT) as being eligible for release. groups a subset of attributes of PT into one or more quasi-identifiers (QIi) a weight from 0 to 1 to each attribute to specify the likelihood the attribute will be used for linking; a 0 value means not likely and a value of 1 means highly probable. specifies a minimum anonymity level that computes to a value for k. Assign a weight from 0 to 1 to each attribute to state a preference of which attributes to distort; a 0 value means the recipient of the data would prefer the values not to be changed and a value of 1 means maximum distortion could be tolerated.

21 For convenience, we consider a single quasi identifier,
where all attributes of the quasi-identifier have equal preference an equal likelihood for linking the weights can be considered as not being present.

22

23 Datafly The core Datafly algorithm
its solutions always satisfy k-anonymity does not necessarily provide k-minimal generalizations or k-minimal distortions, the problems is that Datafly makes crude decisions generalizing all values associated with an attribute and suppressing all values within a tuple.

24 Real-world results-- the data holder specifies which attributes are sensitive by assigning a value between 0 and 3 0 "not identifying," 1 "identifying," 2 "more identifying," 3 "most identifying,“ testing 2- and 3-combinations of attributes. Eliminating unsafe combinations by generalizing attributes cell suppression.

25

26

27 Shortcomings: generalizations may not always satisfy k-anonymity
not examining all combinations of the attributes in the quasi-identifier. Only 2- and 3- combinations are examined. There may exist 4-combinations or larger that are unique.

28 Conclusions Definition: Real-world results Quasi- identifier
K-anonymity table Generalization suppression K-minimal generalization of a table K-minimal distortion of a table Real-world results Datafly Systems u-Argus System

29 The End Thanks


Download ppt "ACHIEVING k-ANONYMITY PRIVACY PROTECTION USING GENERALIZATION AND SUPPRESSION International Journal on Uncertainty, Fuzziness and Knowledge-based Systems,"

Similar presentations


Ads by Google