Privacy-Preserving Data Publishing Donghui Zhang Northeastern University Acknowledgement: some slides come from Yufei Tao and Dimitris Sacharidis.

Slides:



Advertisements
Similar presentations
Jeremiah Blocki CMU Ryan Williams IBM Almaden ICALP 2010.
Advertisements

Simulatability “The enemy knows the system”, Claude Shannon CompSci Instructor: Ashwin Machanavajjhala 1Lecture 6 : Fall 12.
M-Invariance: Towards Privacy Preserving Re-publication of Dynamic Datasets by Tyrone Cadenhead.
M-Invariance and Dynamic Datasets based on: Xiaokui Xiao, Yufei Tao m-Invariance: Towards Privacy Preserving Re-publication of Dynamic Datasets Slawomir.
Personalized Privacy Preservation Xiaokui Xiao, Yufei Tao City University of Hong Kong.
1 IS 2150 / TEL 2810 Information Security & Privacy James Joshi Associate Professor, SIS Lecture 11 April 10, 2013 Information Privacy (Contributed by.
Anatomy: Simple and Effective Privacy Preservation Xiaokui Xiao, Yufei Tao Chinese University of Hong Kong.
Fast Data Anonymization with Low Information Loss 1 National University of Singapore 2 Hong Kong University
Privacy Preserving Data Publication Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong.
UTEPComputer Science Dept.1 University of Texas at El Paso Privacy in Statistical Databases Dr. Luc Longpré Computer Science Department Spring 2006.
Privacy Preserving Serial Data Publishing By Role Composition Yingyi Bu 1, Ada Wai-Chee Fu 1, Raymond Chi-Wing Wong 2, Lei Chen 2, Jiuyong Li 3 The Chinese.
1 On the Anonymization of Sparse High-Dimensional Data 1 National University of Singapore 2 Chinese University of Hong.
C MU U sable P rivacy and S ecurity Laboratory 1 Privacy Policy, Law and Technology Data Privacy October 30, 2008.
Anatomy: Simple and Effective Privacy Preservation Israel Chernyak DB Seminar (winter 2009)
Suppose I learn that Garth has 3 friends. Then I know he must be one of {v 1,v 2,v 3 } in Figure 1 above. If I also learn the degrees of his neighbors,
L-Diversity: Privacy Beyond K-Anonymity
MobiHide: A Mobile Peer-to-Peer System for Anonymous Location-Based Queries Gabriel Ghinita, Panos Kalnis, Spiros Skiadopoulos National University of Singapore.
Ιδιωτικότητα σε Βάσεις Δεδομένων Οκτώβρης Roadmap Motivation Core ideas Extensions 2.
PRIVACY CRITERIA. Roadmap Privacy in Data mining Mobile privacy (k-e) – anonymity (c-k) – safety Privacy skyline.
Preserving Privacy in Clickstreams Isabelle Stanton.
Privacy-preserving Anonymization of Set Value Data Manolis Terrovitis, Nikos Mamoulis University of Hong Kong Panos Kalnis National University of Singapore.
Privacy-preserving Anonymization of Set Value Data Manolis Terrovitis Institute for the Management of Information Systems (IMIS), RC Athena Nikos Mamoulis.
Database Access Control & Privacy: Is There A Common Ground? Surajit Chaudhuri, Raghav Kaushik and Ravi Ramamurthy Microsoft Research.
Database Laboratory Regular Seminar TaeHoon Kim.
Preserving Privacy in Published Data
Strategic Modeling of Information Sharing among Data Privacy Attackers Quang Duong, Kristen LeFevre, and Michael Wellman University of Michigan Presented.
Overview of Privacy Preserving Techniques.  This is a high-level summary of the state-of-the-art privacy preserving techniques and research areas  Focus.
Beyond k-Anonymity Arik Friedman November 2008 Seminar in Databases (236826)
Publishing Microdata with a Robust Privacy Guarantee
Data Publishing against Realistic Adversaries Johannes Gerhrke Cornell University Ithaca, NY Michaela Götz Cornell University Ithaca, NY Ashwin Machanavajjhala.
Sumathie Sundaresan Advisor : Dr. Huiping Guo Survey of Privacy Protection for Medical Data.
Background Knowledge Attack for Generalization based Privacy- Preserving Data Mining.
CS573 Data Privacy and Security Anonymization methods Li Xiong.
Refined privacy models
Accuracy-Constrained Privacy-Preserving Access Control Mechanism for Relational Data.
K-Anonymity & Algorithms
Dimensions of Privacy 18739A: Foundations of Security and Privacy Anupam Datta Fall 2009.
Data Anonymization (1). Outline  Problem  concepts  algorithms on domain generalization hierarchy  Algorithms on numerical data.
Data Anonymization – Introduction and k-anonymity Li Xiong CS573 Data Privacy and Security.
Hybrid l-Diversity* Mehmet Ercan NergizMuhammed Zahit GökUfuk Özkanlı
Preservation of Proximity Privacy in Publishing Numerical Sensitive Data J. Li, Y. Tao, and X. Xiao SIGMOD 08 Presented by Hongwei Tian.
On the Approximability of Geometric and Geographic Generalization and the Min- Max Bin Covering Problem Michael T. Goodrich Dept. of Computer Science joint.
Privacy vs. Utility Xintao Wu University of North Carolina at Charlotte Nov 10, 2008.
Differential Privacy Some contents are borrowed from Adam Smith’s slides.
Privacy-preserving data publishing
The Impact of Duality on Data Representation Problems Panagiotis Karras HKU, June 14 th, 2007.
Thesis Sumathie Sundaresan Advisor: Dr. Huiping Guo.
CSCI 347, Data Mining Data Anonymization.
Location Privacy Protection for Location-based Services CS587x Lecture Department of Computer Science Iowa State University.
Probabilistic km-anonymity (Efficient Anonymization of Large Set-valued Datasets) Gergely Acs (INRIA) Jagdish Achara (INRIA)
Differential Privacy (1). Outline  Background  Definition.
Differential Privacy Xintao Wu Oct 31, Sanitization approaches Input perturbation –Add noise to data –Generalize data Summary statistics –Means,
Unraveling an old cloak: k-anonymity for location privacy
Personalized Privacy Preservation: beyond k-anonymity and ℓ-diversity SIGMOD 2006 Presented By Hongwei Tian.
ROLE OF ANONYMIZATION FOR DATA PROTECTION Irene Schluender and Murat Sariyar (TMF)
Data Mining And Privacy Protection Prepared by: Eng. Hiba Ramadan Supervised by: Dr. Rakan Razouk.
Output Perturbation with Query Relaxation By: XIAO Xiaokui and TAO Yufei Presenter: CUI Yingjie.
Privacy Issues in Graph Data Publishing Summer intern: Qing Zhang (from NC State University) Mentors: Graham Cormode and Divesh Srivastava.
Xiaowei Ying, Kai Pan, Xintao Wu, Ling Guo Univ. of North Carolina at Charlotte SNA-KDD June 28, 2009, Paris, France Comparisons of Randomization and K-degree.
Versatile Publishing For Privacy Preservation
Fast Data Anonymization with Low Information Loss
ACHIEVING k-ANONYMITY PRIVACY PROTECTION USING GENERALIZATION AND SUPPRESSION International Journal on Uncertainty, Fuzziness and Knowledge-based Systems,
Xiaokui Xiao and Yufei Tao Chinese University of Hong Kong
Progressive Computation of The Min-Dist Optimal-Location Query
By (Group 17) Mahesha Yelluru Rao Surabhee Sinha Deep Vakharia
Differential Privacy in Practice
Presented by : SaiVenkatanikhil Nimmagadda
TELE3119: Trusted Networks Week 4
Privacy-Preserving Data Publishing
Presentation transcript:

Privacy-Preserving Data Publishing Donghui Zhang Northeastern University Acknowledgement: some slides come from Yufei Tao and Dimitris Sacharidis.

motivation several agencies, institutions, bureaus, organizations make (sensitive) data involving people publicly available –termed microdata (vs. aggregated macrodata) used for analysis –often required and imposed by law to protect privacy microdata are sanitized –explicit identifiers (SSN, name, phone #) are removed is this sufficient for preserving privacy? no! susceptible to link attacks –publicly available databases (voter lists, city directories) can reveal the “hidden” identity

link attack example [Sweeney01] managed to re-identify the medical record of the governor of Massachussetts –MA collects and publishes sanitized medical data for state employees (microdata) left circle –voter registration list of MA (publicly available data) right circle looking for governor’s record join the tables: –6 people had his birth date –3 were men –1 in his zipcode regarding the US 1990 census data –87% of the population are unique based on (zipcode, gender, dob)

Microdata NameAgeZipcodeDisease Bob dyspepsia Alice bronchitis Andy flu David gastritis Gary flu Helen gastritis Jane dyspepsia Ken flu Linda gastritis Paul dyspepsia Steve gastritis

Inference Attack Published table An adversary Quasi-identifier (QI) attributes AgeZipcodeDisease dyspepsia bronchitis flu gastritis flu gastritis dyspepsia flu gastritis dyspepsia gastritis NameAgeZipcode Bob

k-anonymity [Samarati and Sweeney02] Transform the QI values into less specific forms generalize AgeZipcodeDisease dyspepsia bronchitis flu gastritis flu gastritis dyspepsia flu gastritis dyspepsia gastritis AgeZipcodeDisease [21, 22][12k, 14k]dyspepsia [21, 22][12k, 14k]bronchitis [23, 24][18k, 25k]flu [23, 24][18k, 25k]gastritis [36, 41][20k, 27k]flu [36, 41][20k, 27k]gastritis [37, 43][26k, 35k]dyspepsia [37, 43][26k, 35k]flu [37, 43][26k, 35k]gastritis [52, 56][33k, 34k]dyspepsia [52, 56][33k, 34k]gastritis

Generalization Transform each QI value into a less specific form A generalized table An adversary NameAgeZipcode Bob AgeZipcodeDisease [21, 22][12k, 14k]dyspepsia [21, 22][12k, 14k]bronchitis [23, 24][18k, 25k]flu [23, 24][18k, 25k]gastritis [36, 41][20k, 27k]flu [36, 41][20k, 27k]gastritis [37, 43][26k, 35k]dyspepsia [37, 43][26k, 35k]flu [37, 43][26k, 35k]gastritis [52, 56][33k, 34k]dyspepsia [52, 56][33k, 34k]gastritis

Graphically… Bob Alice

Why not… How many people with age in [30, 50] contracted flu?

k-anonymity AgeZipcodeDisease [21, 22][12k, 14k]dyspepsia [21, 22][12k, 14k]bronchitis [23, 24][18k, 25k]flu [23, 24][18k, 25k]gastritis [36, 41][20k, 27k]flu [36, 41][20k, 27k]gastritis [37, 43][26k, 35k]dyspepsia [37, 43][26k, 35k]flu [37, 43][26k, 35k]gastritis [52, 56][33k, 34k]dyspepsia [52, 56][33k, 34k]gastritis AgeZipcodeDisease [21, 56][12k, 35k]dyspepsia [21, 56][12k, 35k]bronchitis [21, 56][12k, 35k]flu [21, 56][12k, 35k]gastritis [21, 56][12k, 35k]flu [21, 56][12k, 35k]gastritis [21, 56][12k, 35k]dyspepsia [21, 56][12k, 35k]flu [21, 56][12k, 35k]gastritis [21, 56][12k, 35k]dyspepsia [21, 56][12k, 35k]gastritis How many people with age in [30, 50] contracted flu? generalization with low utility: answer less accurately: [0..3] generalization with high utility: answer queries more accurately: 2.

k-anonymity with utility Among all generalizations that enforce k- anonymity, we should maximize utility by minimizing the “rectangle” sizes! Several measures. E.g. to minimize the maximal perimeter size of the rectangles.

Mondrian [LDR06] Recursive half-plane partitioning, alternating dimensions. let k=2

Mondrian [LDR06] Unbounded approximation ratio! let k=4

Our contributions [DXT+07] Proved that to find the optimal partitioning is NP-hard. Proved that to find a partitioning with approximation ratio less than 1.25 is also NP-hard. Provided three algorithms with tradeoffs in complexity and approximation ratio.

Divide-And-Group (DAG) Divide the space into square cells with proper size Find a set of non-overlapping tiles of 2 x 2 cells to cover the points, such that each tile covers at least k points Assign the rest of (uncovered) points to the nearest tile

Min-MBR-Group (MMG) For each point p, find the smallest MBR which covers at least k points including p Find a set of non-overlapping MBRs from the result of previous step Assign the points to the nearest MBR

Nearest-Neighbor-Group (NNG) For each point p, find the MBR which covers p and its k-1 nearest neighbors Find a set of non-overlapping MBRs from the result of previous step Assign the points to the nearest MBR

Analysis AlgorithmComplexityApproximation Ratio DAGO(3 d d n log 2 n)8d MMGO(d n 2d+1 )2d+1 NNGO(d n 2 )6d

In a QI group, if many records have the same sensitive attribute value... Drawback of k-anonymity Quasi-identifier (QI) attributes Sensitive attribute AgeSexZipcodeDisease [21, 40]M[10001, 60000]pneumonia [30, 60]M[10001, 60000]dyspepsia [30, 60]M[10001, 60000]dyspepsia [21, 40]M[10001, 60000]pneumonia [61, 65]F[10001, 60000]flu [63, 70]F[10001, 60000]gastritis [61, 65]F[10001, 60000]flu [63, 70]F[10001, 60000]bronchitis If Bob is in this group, he must have pneumonia.

l-diversity [ICDE06] A QI-group with m tuples is l -diverse, iff each sensitive value appears no more than m / l times in the QI-group. A table is l -diverse, iff all of its QI-groups are l -diverse. The above table is 2-diverse. 2 QI-groups Quasi-identifier (QI) attributes Sensitive attribute AgeSexZipcodeDisease [21, 60]M[10001, 60000]pneumonia [21, 60]M[10001, 60000]dyspepsia [21, 60]M[10001, 60000]dyspepsia [21, 60]M[10001, 60000]pneumonia [61, 70]F[10001, 60000]flu [61, 70]F[10001, 60000]gastritis [61, 70]F[10001, 60000]flu [61, 70]F[10001, 60000]bronchitis

What l-diversity guarantees From an l-diverse generalized table, an adversary (without any prior knowledge) can infer the sensitive value of each individual with confidence at most 1/l AgeSexZipcodeDisease [21, 60]M[10001, 60000]pneumonia [21, 60]M[10001, 60000]dyspepsia [21, 60]M[10001, 60000]dyspepsia [21, 60]M[10001, 60000]pneumonia [61, 70]F[10001, 60000]flu [61, 70]F[10001, 60000]gastritis [61, 70]F[10001, 60000]flu [61, 70]F[10001, 60000]bronchitis NameAgeSexZipcode Bob23M11000 A 2-diverse generalized table A. Machanavajjhala et al. l-Diversity: Privacy Beyond k-Anonymity. ICDE 2006

Problem with multi-publishing A hospital keeps track of the medical records collected in the last three months. The microdata table T(1), and its generalization T*(1), published in Apr NameAgeZipcodeDisease Bob dyspepsia Alice bronchitis Andy flu David gastritis Gary flu Helen gastritis Jane dyspepsia Ken flu Linda gastritis Paul dyspepsia Steve gastritis Microdata T(1) G. IDAgeZipcodeDisease 1[21, 22][12k, 14k]dyspepsia 1[21, 22][12k, 14k]bronchitis 2[23, 24][18k, 25k]flu 2[23, 24][18k, 25k]gastritis 3[36, 41][20k, 27k]flu 3[36, 41][20k, 27k]gastritis 4[37, 43][26k, 35k]dyspepsia 4[37, 43][26k, 35k]flu 4[37, 43][26k, 35k]gastritis 5[52, 56][33k, 34k]dyspepsia 5[52, 56][33k, 34k]gastritis 2-diverse Generalization T*(1)

Problem with multi-publishing Bob was hospitalized in Mar NameAgeZipcode Bob G. IDAgeZipcodeDisease 1[21, 22][12k, 14k]dyspepsia 1[21, 22][12k, 14k]bronchitis 2[23, 24][18k, 25k]flu 2[23, 24][18k, 25k]gastritis 3[36, 41][20k, 27k]flu 3[36, 41][20k, 27k]gastritis 4[37, 43][26k, 35k]dyspepsia 4[37, 43][26k, 35k]flu 4[37, 43][26k, 35k]gastritis 5[52, 56][33k, 34k]dyspepsia 5[52, 56][33k, 34k]gastritis 2-diverse Generalization T*(1)

Problem with multi-publishing One month later, in May 2007 NameAgeZipcodeDisease Bob dyspepsia Alice bronchitis Andy flu David gastritis Gary flu Helen gastritis Jane dyspepsia Ken flu Linda gastritis Paul dyspepsia Steve gastritis Microdata T(1)

Problem with multi-publishing One month later, in May 2007 Some obsolete tuples are deleted from the microdata. Microdata T(1) NameAgeZipcodeDisease Bob dyspepsia Alice bronchitis Andy flu David gastritis Gary flu Helen gastritis Jane dyspepsia Ken flu Linda gastritis Paul dyspepsia Steve gastritis

Problem with multi-publishing Bob’s tuple stays. Microdata T(1) NameAgeZipcodeDisease Bob dyspepsia David gastritis Gary flu Jane dyspepsia Linda gastritis Steve gastritis

Problem with multi-publishing Some new records are inserted. Microdata T(2) NameAgeZipcodeDisease Bob dyspepsia David gastritis Emily flu Jane dyspepsia Linda gastritis Gary flu Mary gastritis Ray dyspepsia Steve gastritis Tom gastritis Vince flu

Problem with multi-publishing The hospital published T*(2). NameAgeZipcodeDisease Bob dyspepsia David gastritis Emily flu Jane dyspepsia Linda gastritis Gary flu Mary gastritis Ray dyspepsia Steve gastritis Tom gastritis Vince flu Microdata T(2) G. IDAgeZipcodeDisease 1[21, 23][12k, 25k]dyspepsia 1[21, 23][12k, 25k]gastritis 2[25, 43][21k, 33k]flu 2[25, 43][21k, 33k]dyspepsia 3[25, 43][21k, 33k]gastritis 3[41, 46][20k, 30k]flu 4[41, 46][20k, 30k]gastritis 4[54, 56][31k, 34k]dyspepsia 4[54, 56][31k, 34k]gastritis 5[60, 65][36k, 44k]gastritis 5[60, 65][36k, 44k]flu 2-diverse Generalization T*(2)

Problem with multi-publishing Consider the previous adversary. NameAgeZipcode Bob G. IDAgeZipcodeDisease 1[21, 23][12k, 25k]dyspepsia 1[21, 23][12k, 25k]gastritis 2[25, 43][21k, 33k]flu 2[25, 43][21k, 33k]dyspepsia 3[25, 43][21k, 33k]gastritis 3[41, 46][20k, 30k]flu 4[41, 46][20k, 30k]gastritis 4[54, 56][31k, 34k]dyspepsia 4[54, 56][31k, 34k]gastritis 5[60, 65][36k, 44k]gastritis 5[60, 65][36k, 44k]flu 2-diverse Generalization T*(2)

Problem with multi-publishing What the adversary learns from T*(1). What the adversary learns from T*(2). So Bob must have contracted dyspepsia! A new generalization principle is needed. NameAgeZipcode Bob G. IDAgeZipcodeDisease 1[21, 22][12k, 14k]dyspepsia 1[21, 22][12k, 14k]bronchitis …… NameAgeZipcode Bob G. IDAgeZipcodeDisease 1[21, 23][12k, 25k]dyspepsia 1[21, 23][12k, 25k]gastritis ……

m-invariance [SIGMOD07] A sequence of generalized tables T*(1), …, T*(n) is m-invariant, if and only if –T*(1), …, T*(n) are m-unique, and –each individual has the same signature in every generalized table s/he is involved. Explanation –m-unique: every QI group contains at least m tuples with different sensitive attributes –signature: all the sensitive attributes in the individual’s QI group.

m-unique A generalized table T*(j) is m-unique, if and only if –each QI-group in T*(j) contains at least m tuples –all tuples in the same QI-group have different sensitive values. G. IDAgeZipcodeDisease 1[21, 22][12k, 14k]dyspepsia 1[21, 22][12k, 14k]bronchitis 2[23, 24][18k, 25k]flu 2[23, 24][18k, 25k]gastritis 3[36, 41][20k, 27k]flu 3[36, 41][20k, 27k]gastritis 4[37, 43][26k, 35k]dyspepsia 4[37, 43][26k, 35k]flu 4[37, 43][26k, 35k]gastritis 5[52, 56][33k, 34k]dyspepsia 5[52, 56][33k, 34k]gastritis A 2-unique generalized table

Signature The signature of Bob in T*(1) is {dyspepsia, bronchitis} The signature of Jane in T*(1) is {dyspepsia, flu, gastritis} NameG.IDAgeZipcodeDisease Bob1[21, 22][12k, 14k]dyspepsia Alice1[21, 22][12k, 14k]bronchitis …………… Jane4[37, 43][26k, 35k]dyspepsia Ken4[37, 43][26k, 35k]flu Linda4[37, 43][26k, 35k]gastritis …………… T*(1)

The m-invariance principle Lemma: if a sequence of generalized tables {T*(1), …, T*(n)} is m-invariant, then for any individual o involved in any of these tables, we have risk(o) <= 1/m

The m-invariance principle Lemma: let {T*(1), …, T*(n-1)} be m-invariant. {T*(1), …, T*(n-1), T*(n)} is also m-invariant, if and only if {T*(n-1), T*(n)} is m-invariant Only T*(n - 1) is needed for the generation of T*(n). T*(1), T*(2), …, T*(n-2), T*(n-1), T*(n) Can be discarded

Solution idea Goal: Given T(n) and T*(n-1), create T*(n) such that {T*(n-1) and T*(n)} is m-invariant. Idea: create counterfeits. Optimization goal: to impose as little amount of generalization as possible.

NameGroup-IDAgeZipcodeDisease Bob1[21, 22][12k, 14k]dyspepsia c1c11[21, 22][12k, 14k]bronchitis David2[23, 25][21k, 25k]gastritis Emily2[23, 25][21k, 25k]flu Jane3[37, 43][26k, 33k]dyspepsia c2c23[37, 43][26k, 33k]flu Linda3[37, 43][26k, 33k]gastritis Gary4[41, 46][20k, 30k]flu Mary4[41, 46][20k, 30k]gastritis Ray5[54, 56][31k, 34k]dyspepsia Steve5[54, 56][31k, 34k]gastritis Tom6[60, 65][36k, 44k]gastritis Vince6[60, 65][36k, 44k]flu Counterfeited generalization T*(2) Group-IDCount The auxiliary relation R(2) for T*(2) NameAgeZipcodeDisease Bob dyspepsia David gastritis Emily flu Jane dyspepsia Linda gastritis Gary flu Mary gastritis Ray dyspepsia Steve gastritis Tom gastritis Vince flu Microdata T(2)

NameG.IDAgeZipcodeDisease Bob1[21, 22][12k, 14k]dyspepsia c1c11[21, 22][12k, 14k]bronchitis David2[23, 25][21k, 25k]gastritis Emily2[23, 25][21k, 25k]flu Jane3[37, 43][26k, 33k]dyspepsia c2c23[37, 43][26k, 33k]flu Linda3[37, 43][26k, 33k]gastritis Gary4[41, 46][20k, 30k]flu Mary4[41, 46][20k, 30k]gastritis Ray5[54, 56][31k, 34k]dyspepsia Steve5[54, 56][31k, 34k]gastritis Tom6[60, 65][36k, 44k]gastritis Vince6[60, 65][36k, 44k]flu Counterfeited Generalization T*(2) Group-IDCount The auxiliary relation R(2) for T*(2) NameG.IDAgeZipcodeDisease Bob1[21, 22][12k, 14k]dyspepsia Alice1[21, 22][12k, 14k]bronchitis Andy2[23, 24][18k, 25k]flu David2[23, 24][18k, 25k]gastritis Gary3[36, 41][20k, 27k]flu Helen3[36, 41][20k, 27k]gastritis Jane4[37, 43][26k, 35k]dyspepsia Ken4[37, 43][26k, 35k]flu Linda4[37, 43][26k, 35k]gastritis Paul5[52, 56][33k, 34k]dyspepsia Steve5[52, 56][33k, 34k]gastritis Generalization T*(1) NameAgeZipcode Bob

NameG.IDAgeZipcodeDisease Bob1[21, 22][12k, 14k]dyspepsia c1c11[21, 22][12k, 14k]bronchitis David2[23, 25][21k, 25k]gastritis Emily2[23, 25][21k, 25k]flu Jane3[37, 43][26k, 33k]dyspepsia c2c23[37, 43][26k, 33k]flu Linda3[37, 43][26k, 33k]gastritis Gary4[41, 46][20k, 30k]flu Mary4[41, 46][20k, 30k]gastritis Ray5[54, 56][31k, 34k]dyspepsia Steve5[54, 56][31k, 34k]gastritis Tom6[60, 65][36k, 44k]gastritis Vince6[60, 65][36k, 44k]flu Generalization T*(2) NameG.IDAgeZipcodeDisease Bob1[21, 22][12k, 14k]dyspepsia Alice1[21, 22][12k, 14k]bronchitis Andy2[23, 24][18k, 25k]flu David2[23, 24][18k, 25k]gastritis Gary3[36, 41][20k, 27k]flu Helen3[36, 41][20k, 27k]gastritis Jane4[37, 43][26k, 35k]dyspepsia Ken4[37, 43][26k, 35k]flu Linda4[37, 43][26k, 35k]gastritis Paul5[52, 56][33k, 34k]dyspepsia Steve5[52, 56][33k, 34k]gastritis Generalization T*(1) A sequence of generalized tables T*(1), …, T*(n) is m- invariant, if and only if –T*(1), …, T*(n) are m-unique, and –each individual has the same signature in every generalized table s/he is involved.

NameG.IDAgeZipcodeDisease Bob1[21, 22][12k, 14k]dyspepsia c1c11[21, 22][12k, 14k]bronchitis David2[23, 25][21k, 25k]gastritis Emily2[23, 25][21k, 25k]flu Jane3[37, 43][26k, 33k]dyspepsia c2c23[37, 43][26k, 33k]flu Linda3[37, 43][26k, 33k]gastritis Gary4[41, 46][20k, 30k]flu Mary4[41, 46][20k, 30k]gastritis Ray5[54, 56][31k, 34k]dyspepsia Steve5[54, 56][31k, 34k]gastritis Tom6[60, 65][36k, 44k]gastritis Vince6[60, 65][36k, 44k]flu Generalization T*(2) NameG.IDAgeZipcodeDisease Bob1[21, 22][12k, 14k]dyspepsia Alice1[21, 22][12k, 14k]bronchitis Andy2[23, 24][18k, 25k]flu David2[23, 24][18k, 25k]gastritis Gary3[36, 41][20k, 27k]flu Helen3[36, 41][20k, 27k]gastritis Jane4[37, 43][26k, 35k]dyspepsia Ken4[37, 43][26k, 35k]flu Linda4[37, 43][26k, 35k]gastritis Paul5[52, 56][33k, 34k]dyspepsia Steve5[52, 56][33k, 34k]gastritis Generalization T*(1) A sequence of generalized tables T*(1), …, T*(n) is m- invariant, if and only if –T*(1), …, T*(n) are m-unique, and –each individual has the same signature in every generalized table s/he is involved.

NameG.IDAgeZipcodeDisease Bob1[21, 22][12k, 14k]dyspepsia c1c11[21, 22][12k, 14k]bronchitis David2[23, 25][21k, 25k]gastritis Emily2[23, 25][21k, 25k]flu Jane3[37, 43][26k, 33k]dyspepsia c2c23[37, 43][26k, 33k]flu Linda3[37, 43][26k, 33k]gastritis Gary4[41, 46][20k, 30k]flu Mary4[41, 46][20k, 30k]gastritis Ray5[54, 56][31k, 34k]dyspepsia Steve5[54, 56][31k, 34k]gastritis Tom6[60, 65][36k, 44k]gastritis Vince6[60, 65][36k, 44k]flu Generalization T*(2) NameG.IDAgeZipcodeDisease Bob1[21, 22][12k, 14k]dyspepsia Alice1[21, 22][12k, 14k]bronchitis Andy2[23, 24][18k, 25k]flu David2[23, 24][18k, 25k]gastritis Gary3[36, 41][20k, 27k]flu Helen3[36, 41][20k, 27k]gastritis Jane4[37, 43][26k, 35k]dyspepsia Ken4[37, 43][26k, 35k]flu Linda4[37, 43][26k, 35k]gastritis Paul5[52, 56][33k, 34k]dyspepsia Steve5[52, 56][33k, 34k]gastritis Generalization T*(1) A sequence of generalized tables T*(1), …, T*(n) is m- invariant, if and only if –T*(1), …, T*(n) are m-unique, and –each individual has the same signature in every generalized table s/he is involved.

In case of corruption… If an adversary knows from Alice that she has bronchitis, he can conclude that Bob has dyspepsia. NameAgeZipcodeDisease Bob dyspepsia Alice bronchitis Andy flu David gastritis Gary flu Helen gastritis Jane dyspepsia Ken flu Linda gastritis Paul dyspepsia Steve gastritis Microdata G. IDAgeZipcodeDisease 1[21, 22][12k, 14k]dyspepsia 1[21, 22][12k, 14k]bronchitis 2[23, 24][18k, 25k]flu 2[23, 24][18k, 25k]gastritis 3[36, 41][20k, 27k]flu 3[36, 41][20k, 27k]gastritis 4[37, 43][26k, 35k]dyspepsia 4[37, 43][26k, 35k]flu 4[37, 43][26k, 35k]gastritis 5[52, 56][33k, 34k]dyspepsia 5[52, 56][33k, 34k]gastritis 2-diverse Generalization

Anti-corruption publishing [ICDE08] We formalized anti-corruption publishing, by modeling the degree of privacy preservation as a function of an adversary’s background knowledge. We proposed a solution, by integrating generalization with –perturbation: switch selected records’ sensitive information. –stratified sampling: sample some records from each QI group.

Summary Introduced the problem of privacy-preserving publishing. Two principles: –k-anonymity –l-diversity Two extensions: –multi-publishing –corruption