ACHIEVING k-ANONYMITY PRIVACY PROTECTION USING GENERALIZATION AND SUPPRESSION International Journal on Uncertainty, Fuzziness and Knowledge-based Systems,

ACHIEVING k-ANONYMITY PRIVACY PROTECTION USING GENERALIZATION AND SUPPRESSION
International Journal on Uncertainty, Fuzziness and Knowledge-based Systems, 2002 Cited: 445 Liyan Zhang for CS 295

Outline Need for Privacy K-anonymity privacy protection
Methods for K-anonymity privacy protection Generalization including suppression Minimal generalization of a table Minimal distortion of a table Algorithm for finding a minimal generalization with minimal distortion Real-world results Datafly Systems u-Argus System

Need for Privacy Suppose that a medical institution, public health agency, or financial organization wants to publish person-specific records They want to publish such that: Information remains practically useful Identity of an individual cannot be determined Adversary might infer the secret/sensitive data from the published database

Need for Privacy The data contains:
Attribute values which can uniquely identify an individual { zip-code, nationality, age } or/and {name} or/and {SSN} sensitive information corresponding to individuals { medical condition, salary, location } Non-Sensitive Data Sensitive Data # Zip Age Nationality Name Condition 1 13053 28 Indian Kumar Heart Disease 2 13067 29 American Bob 3 35 Canadian Ivan Viral Infection 4 36 Japanese Umeko Cancer

Need for Privacy Published Data Voter List 1 13053 28 Indian
Non-Sensitive Data Sensitive Data # Zip Age Nationality Condition 1 13053 28 Indian Heart Disease 2 13067 29 American 3 35 Canadian Viral Infection 4 36 Japanese Cancer Published Data Data leak! # Name Zip Age Nationality 1 John 13053 28 American 2 Bob 13067 29 3 Chris 23 Voter List

K-anonymity privacy protection
Even if we remove the direct uniquely identifying attributes There are some fields that may still uniquely identify some individual! The attacker can join them with other sources and identify individuals Non-Sensitive Data Sensitive Data # Zip Age Nationality Condition … Quasi-Identifiers

Attributes in the private information that could be used for linking with external information are termed the quasi-identifier. quasi-identifier explicit identifiers such as name, address, and phone number, attributes that in combination can uniquely identify individuals such as birth date, ZIP, and gender.

Our goal: Protect people’s privacy, when releasing person-specific information Limit the ability of using the quasi-identifier to link other external information K-anonymity table Change data in such a way that for each tuple in the resulting table there are at least (k-1) other tuples with the same value for the quasi-identifier If a table is k-anonymity, then each sequence of values in quasi-identifier appears at least k times.

Methods for K-anonymity privacy protection --Generalization including suppression
Replace the original value by a semantically consistent but less specific value Suppression Data not released at all Can be Cell-Level or (more commonly) Tuple-Level # Zip Age Nationality Condition 1 130** < 40 * Heart Disease 2 3 Viral Infection 4 Cancer Generalization Suppression (cell-level)

Use Generalization Hierarchies to create a table generalization
ZIP  Age * Nationality * 130 < 40 1305 1306 < 30 3* American Asian 13053 13058 13063 13067 28 29 36 35 Canadian US Indian Japanese • Generalization Hierarchies: Data owner defines how values can be generalized • Table Generalization: A table generalization is created by generalizing all values in a column to a specific level of generalization

Methods for K-anonymity privacy protection --K-Minimal generalizations of a table
There are many k-anonymizations – which one to pick? Intuition: The one that does not generalize the data more than needed (decrease in utility of the published dataset!) K-minimal generalization: A k-anonymized table that is not a generalization of another k-anonymized table

2-minimal Generalizations NOT a 2-minimal Generalization 1 13053
# Zip Age Nationality Condition 1 13053 < 40 * Heart Disease 2 Viral Infection 3 13067 4 Cancer 2-minimal Generalizations # Zip Age Nationality Condition 1 130** < 30 American Heart Disease 2 Viral Infection 3 3* Asian 4 Cancer # Zip Age Nationality Condition 1 130** < 40 * Heart Disease 2 Viral Infection 3 4 Cancer NOT a 2-minimal Generalization

Methods for K-anonymity privacy protection --k-minimal distortion table
Now, there are many k-minimal generalizations! – which one is preferred then? It can be The one that creates min. distortion to data Distortion measures by the ratio of the domain of the value found in the cell to the height of the attribute’s hierarchy  Current level of generalization for attribute i attrib i Max level of generalization for attribute i D = Number of attributes

Algorithm for finding a minimal generalization with minimal distortion

Real-world results--Datafly Systems
Datafly system: The data holder declares specific attributes and tuples in the original private table (PT) as being eligible for release. groups a subset of attributes of PT into one or more quasi-identifiers (QIi) a weight from 0 to 1 to each attribute to specify the likelihood the attribute will be used for linking; a 0 value means not likely and a value of 1 means highly probable. specifies a minimum anonymity level that computes to a value for k. Assign a weight from 0 to 1 to each attribute to state a preference of which attributes to distort; a 0 value means the recipient of the data would prefer the values not to be changed and a value of 1 means maximum distortion could be tolerated.

For convenience, we consider a single quasi identifier,
where all attributes of the quasi-identifier have equal preference an equal likelihood for linking the weights can be considered as not being present.

Datafly The core Datafly algorithm
its solutions always satisfy k-anonymity does not necessarily provide k-minimal generalizations or k-minimal distortions, the problems is that Datafly makes crude decisions generalizing all values associated with an attribute and suppressing all values within a tuple.

Real-world results-- the data holder specifies which attributes are sensitive by assigning a value between 0 and 3 0 "not identifying," 1 "identifying," 2 "more identifying," 3 "most identifying,“ testing 2- and 3-combinations of attributes. Eliminating unsafe combinations by generalizing attributes cell suppression.

Shortcomings: generalizations may not always satisfy k-anonymity
not examining all combinations of the attributes in the quasi-identifier. Only 2- and 3- combinations are examined. There may exist 4-combinations or larger that are unique.

Conclusions Definition: Real-world results Quasi- identifier
K-anonymity table Generalization suppression K-minimal generalization of a table K-minimal distortion of a table Real-world results Datafly Systems u-Argus System

The End Thanks

ACHIEVING k-ANONYMITY PRIVACY PROTECTION USING GENERALIZATION AND SUPPRESSION International Journal on Uncertainty, Fuzziness and Knowledge-based Systems,

Similar presentations

Presentation on theme: "ACHIEVING k-ANONYMITY PRIVACY PROTECTION USING GENERALIZATION AND SUPPRESSION International Journal on Uncertainty, Fuzziness and Knowledge-based Systems,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

ACHIEVING k-ANONYMITY PRIVACY PROTECTION USING GENERALIZATION AND SUPPRESSION International Journal on Uncertainty, Fuzziness and Knowledge-based Systems,

Similar presentations

Presentation on theme: "ACHIEVING k-ANONYMITY PRIVACY PROTECTION USING GENERALIZATION AND SUPPRESSION International Journal on Uncertainty, Fuzziness and Knowledge-based Systems,"— Presentation transcript:

Similar presentations

About project

Feedback