ACHIEVING k-ANONYMITY PRIVACY PROTECTION USING GENERALIZATION AND SUPPRESSION International Journal on Uncertainty, Fuzziness and Knowledge-based Systems,

Slides:



Advertisements
Similar presentations
Jeremiah Blocki CMU Ryan Williams IBM Almaden ICALP 2010.
Advertisements

Publishing Set-Valued Data via Differential Privacy Rui Chen, Concordia University Noman Mohammed, Concordia University Benjamin C. M. Fung, Concordia.
Differentially Private Recommendation Systems Jeremiah Blocki Fall A: Foundations of Security and Privacy.
Simulatability “The enemy knows the system”, Claude Shannon CompSci Instructor: Ashwin Machanavajjhala 1Lecture 6 : Fall 12.
Minimality Attack in Privacy Preserving Data Publishing Raymond Chi-Wing Wong (the Chinese University of Hong Kong) Ada Wai-Chee Fu (the Chinese University.
Relational Algebra, Join and QBE Yong Choi School of Business CSUB, Bakersfield.
Database Management COP4540, SCS, FIU Functional Dependencies (Chapter 14)
1 Privacy in Microdata Release Prof. Ravi Sandhu Executive Director and Endowed Chair March 22, © Ravi Sandhu.
PRIVACY AND SECURITY ISSUES IN DATA MINING P.h.D. Candidate: Anna Monreale Supervisors Prof. Dino Pedreschi Dott.ssa Fosca Giannotti University of Pisa.
1 Privacy Preserving Data Publishing Prof. Ravi Sandhu Executive Director and Endowed Chair March 29, © Ravi.
UTEPComputer Science Dept.1 University of Texas at El Paso Privacy in Statistical Databases Dr. Luc Longpré Computer Science Department Spring 2006.
Privacy Preserving Serial Data Publishing By Role Composition Yingyi Bu 1, Ada Wai-Chee Fu 1, Raymond Chi-Wing Wong 2, Lei Chen 2, Jiuyong Li 3 The Chinese.
Privacy and k-Anonymity Guy Sagy November 2008 Seminar in Databases (236826)
Anatomy: Simple and Effective Privacy Preservation Israel Chernyak DB Seminar (winter 2009)
April 13, 2010 Towards Publishing Recommendation Data With Predictive Anonymization Chih-Cheng Chang †, Brian Thompson †, Hui Wang ‡, Danfeng Yao † †‡
Protecting Privacy when Disclosing Information Pierangela Samarati Latanya Sweeney.
L-Diversity: Privacy Beyond K-Anonymity
Ιδιωτικότητα σε Βάσεις Δεδομένων Οκτώβρης Roadmap Motivation Core ideas Extensions 2.
Preserving Privacy in Published Data
Li Xiong CS573 Data Privacy and Security Healthcare privacy and security: Genomic data privacy.
Data Publishing against Realistic Adversaries Johannes Gerhrke Cornell University Ithaca, NY Michaela Götz Cornell University Ithaca, NY Ashwin Machanavajjhala.
Thwarting Passive Privacy Attacks in Collaborative Filtering Rui Chen Min Xie Laks V.S. Lakshmanan HKBU, Hong Kong UBC, Canada UBC, Canada Introduction.
Background Knowledge Attack for Generalization based Privacy- Preserving Data Mining.
Refined privacy models
Accuracy-Constrained Privacy-Preserving Access Control Mechanism for Relational Data.
K-Anonymity & Algorithms
Data Anonymization (1). Outline  Problem  concepts  algorithms on domain generalization hierarchy  Algorithms on numerical data.
Data Anonymization – Introduction and k-anonymity Li Xiong CS573 Data Privacy and Security.
Hybrid l-Diversity* Mehmet Ercan NergizMuhammed Zahit GökUfuk Özkanlı
Privacy vs. Utility Xintao Wu University of North Carolina at Charlotte Nov 10, 2008.
Privacy-preserving data publishing
Thesis Sumathie Sundaresan Advisor: Dr. Huiping Guo.
Anonymity and Privacy Issues --- re-identification
Riyadh Philanthropic Society For Science Prince Sultan College For Woman Dept. of Computer & Information Sciences CS 340 Introduction to Database Systems.
CSCI 347, Data Mining Data Anonymization.
Anonymizing Data with Quasi-Sensitive Attribute Values Pu Shi 1, Li Xiong 1, Benjamin C. M. Fung 2 1 Departmen of Mathematics and Computer Science, Emory.
Rensselaer Polytechnic Institute CSCI-4380 – Database Systems David Goldschmidt, Ph.D.
Probabilistic km-anonymity (Efficient Anonymization of Large Set-valued Datasets) Gergely Acs (INRIA) Jagdish Achara (INRIA)
Graph Data Management Lab, School of Computer Science Personalized Privacy Protection in Social Networks (VLDB2011)
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Database Management Systems Chapter 5 SQL.
Unraveling an old cloak: k-anonymity for location privacy
Personalized Privacy Preservation: beyond k-anonymity and ℓ-diversity SIGMOD 2006 Presented By Hongwei Tian.
Data Mining And Privacy Protection Prepared by: Eng. Hiba Ramadan Supervised by: Dr. Rakan Razouk.
Privacy Issues in Graph Data Publishing Summer intern: Qing Zhang (from NC State University) Mentors: Graham Cormode and Divesh Srivastava.
Standards and Conventions
Deriving Private Information from Association Rule Mining Results Zutao Zhu, Guan Wang, and Wenliang Du ICDE /3/181.
Versatile Publishing For Privacy Preservation
Privacy in Database Publishing
Database Systems Chapter 6
Chapter (6) The Relational Algebra and Relational Calculus Objectives
University of Texas at El Paso
Xiaokui Xiao and Yufei Tao Chinese University of Hong Kong
Side-Channel Attack on Encrypted Traffic
Executive Director and Endowed Chair
ADAPTIVE DATA ANONYMIZATION AGAINST INFORMATION FUSION BASED PRIVACY ATTACKS ON ENTERPRISE DATA Srivatsava Ranjit Ganta, Shruthi Prabhakara, Raj Acharya.
CS 440 Database Management Systems
Chapter 2: Intro to Relational Model
Executive Director and Endowed Chair
Anonymity, Unlinkability, Undetectability, Unobservability, Pseudonymity and Identity Management – A Consolidated Proposal for Terminology Authors: Andreas.
By (Group 17) Mahesha Yelluru Rao Surabhee Sinha Deep Vakharia
LECTURE 3: Relational Algebra
Lecture 27: Privacy CS /7/2018.
Data Anonymization – Introduction
Redundancy And Information Leakage In Fine-Grained Access Control
Chapter 2: Intro to Relational Model
Presented by : SaiVenkatanikhil Nimmagadda
Joins and other advanced Queries
TELE3119: Trusted Networks Week 4
SAFE – a method for anonymising the German Census
Refined privacy models
Presentation transcript:

ACHIEVING k-ANONYMITY PRIVACY PROTECTION USING GENERALIZATION AND SUPPRESSION International Journal on Uncertainty, Fuzziness and Knowledge-based Systems, 2002 Cited: 445 Liyan Zhang for CS 295

Outline Need for Privacy K-anonymity privacy protection Methods for K-anonymity privacy protection Generalization including suppression Minimal generalization of a table Minimal distortion of a table Algorithm for finding a minimal generalization with minimal distortion Real-world results Datafly Systems u-Argus System

Need for Privacy Suppose that a medical institution, public health agency, or financial organization wants to publish person-specific records They want to publish such that: Information remains practically useful Identity of an individual cannot be determined Adversary might infer the secret/sensitive data from the published database

Need for Privacy The data contains: Attribute values which can uniquely identify an individual { zip-code, nationality, age } or/and {name} or/and {SSN} sensitive information corresponding to individuals { medical condition, salary, location } Non-Sensitive Data Sensitive Data # Zip Age Nationality Name Condition 1 13053 28 Indian Kumar Heart Disease 2 13067 29 American Bob 3 35 Canadian Ivan Viral Infection 4 36 Japanese Umeko Cancer

Need for Privacy Published Data Voter List 1 13053 28 Indian Non-Sensitive Data Sensitive Data # Zip Age Nationality Condition 1 13053 28 Indian Heart Disease 2 13067 29 American 3 35 Canadian Viral Infection 4 36 Japanese Cancer Published Data Data leak! # Name Zip Age Nationality 1 John 13053 28 American 2 Bob 13067 29 3 Chris 23 Voter List

Outline Need for Privacy K-anonymity privacy protection Methods for K-anonymity privacy protection Generalization including suppression Minimal generalization of a table Minimal distortion of a table Algorithm for finding a minimal generalization with minimal distortion Real-world results Datafly Systems u-Argus System

K-anonymity privacy protection Even if we remove the direct uniquely identifying attributes There are some fields that may still uniquely identify some individual! The attacker can join them with other sources and identify individuals Non-Sensitive Data Sensitive Data # Zip Age Nationality Condition … Quasi-Identifiers

K-anonymity privacy protection Attributes in the private information that could be used for linking with external information are termed the quasi-identifier. quasi-identifier explicit identifiers such as name, address, and phone number, attributes that in combination can uniquely identify individuals such as birth date, ZIP, and gender.

K-anonymity privacy protection Our goal: Protect people’s privacy, when releasing person-specific information Limit the ability of using the quasi-identifier to link other external information K-anonymity table Change data in such a way that for each tuple in the resulting table there are at least (k-1) other tuples with the same value for the quasi-identifier If a table is k-anonymity, then each sequence of values in quasi-identifier appears at least k times.

Outline Need for Privacy K-anonymity privacy protection Methods for K-anonymity privacy protection Generalization including suppression Minimal generalization of a table Minimal distortion of a table Algorithm for finding a minimal generalization with minimal distortion Real-world results Datafly Systems u-Argus System

Methods for K-anonymity privacy protection --Generalization including suppression Replace the original value by a semantically consistent but less specific value Suppression Data not released at all Can be Cell-Level or (more commonly) Tuple-Level # Zip Age Nationality Condition 1 130** < 40 * Heart Disease 2 3 Viral Infection 4 Cancer Generalization Suppression (cell-level)

Use Generalization Hierarchies to create a table generalization ZIP  Age * Nationality * 130 < 40 1305 1306 < 30 3* American Asian 13053 13058 13063 13067 28 29 36 35 Canadian US Indian Japanese • Generalization Hierarchies: Data owner defines how values can be generalized • Table Generalization: A table generalization is created by generalizing all values in a column to a specific level of generalization

Methods for K-anonymity privacy protection --K-Minimal generalizations of a table There are many k-anonymizations – which one to pick? Intuition: The one that does not generalize the data more than needed (decrease in utility of the published dataset!) K-minimal generalization: A k-anonymized table that is not a generalization of another k-anonymized table

2-minimal Generalizations NOT a 2-minimal Generalization 1 13053 # Zip Age Nationality Condition 1 13053 < 40 * Heart Disease 2 Viral Infection 3 13067 4 Cancer 2-minimal Generalizations # Zip Age Nationality Condition 1 130** < 30 American Heart Disease 2 Viral Infection 3 3* Asian 4 Cancer # Zip Age Nationality Condition 1 130** < 40 * Heart Disease 2 Viral Infection 3 4 Cancer NOT a 2-minimal Generalization

Methods for K-anonymity privacy protection --k-minimal distortion table Now, there are many k-minimal generalizations! – which one is preferred then? It can be The one that creates min. distortion to data Distortion measures by the ratio of the domain of the value found in the cell to the height of the attribute’s hierarchy  Current level of generalization for attribute i attrib i Max level of generalization for attribute i D = Number of attributes

2-min

Algorithm for finding a minimal generalization with minimal distortion

Outline Need for Privacy K-anonymity privacy protection Methods for K-anonymity privacy protection Generalization including suppression Minimal generalization of a table Minimal distortion of a table Algorithm for finding a minimal generalization with minimal distortion Real-world results Datafly Systems u-Argus System

Real-world results--Datafly Systems Datafly system: The data holder declares specific attributes and tuples in the original private table (PT) as being eligible for release. groups a subset of attributes of PT into one or more quasi-identifiers (QIi) a weight from 0 to 1 to each attribute to specify the likelihood the attribute will be used for linking; a 0 value means not likely and a value of 1 means highly probable. specifies a minimum anonymity level that computes to a value for k. Assign a weight from 0 to 1 to each attribute to state a preference of which attributes to distort; a 0 value means the recipient of the data would prefer the values not to be changed and a value of 1 means maximum distortion could be tolerated.

For convenience, we consider a single quasi identifier, where all attributes of the quasi-identifier have equal preference an equal likelihood for linking the weights can be considered as not being present.

Datafly The core Datafly algorithm its solutions always satisfy k-anonymity does not necessarily provide k-minimal generalizations or k-minimal distortions, the problems is that Datafly makes crude decisions generalizing all values associated with an attribute and suppressing all values within a tuple.

Real-world results-- the data holder specifies which attributes are sensitive by assigning a value between 0 and 3 0 "not identifying," 1 "identifying," 2 "more identifying," 3 "most identifying,“ testing 2- and 3-combinations of attributes. Eliminating unsafe combinations by generalizing attributes cell suppression.

Shortcomings: generalizations may not always satisfy k-anonymity not examining all combinations of the attributes in the quasi-identifier. Only 2- and 3- combinations are examined. There may exist 4-combinations or larger that are unique.

Conclusions Definition: Real-world results Quasi- identifier K-anonymity table Generalization suppression K-minimal generalization of a table K-minimal distortion of a table Real-world results Datafly Systems u-Argus System

The End Thanks