Personalized Privacy Preservation: beyond k-anonymity and ℓ-diversity SIGMOD 2006 Presented By Hongwei Tian.

Slides:



Advertisements
Similar presentations
The Role of History and Prediction in Data Privacy Kristen LeFevre University of Michigan May 13, 2009.
Advertisements

Simulatability “The enemy knows the system”, Claude Shannon CompSci Instructor: Ashwin Machanavajjhala 1Lecture 6 : Fall 12.
Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07 Probabilistic Privacy Analysis of Published Views Hui (Wendy) Wang Laks V.S.
Anonymizing Location-based data Jarmanjit Singh Jar_sing(at)encs.concordia.ca Harpreet Sandhu h_san(at)encs.concordia.ca Qing Shi q_shi(at)encs.concordia.ca.
M-Invariance: Towards Privacy Preserving Re-publication of Dynamic Datasets by Tyrone Cadenhead.
Template-Based Privacy Preservation in Classification Problems IEEE ICDM 2005 Benjamin C. M. Fung Simon Fraser University BC, Canada Ke.
M-Invariance and Dynamic Datasets based on: Xiaokui Xiao, Yufei Tao m-Invariance: Towards Privacy Preserving Re-publication of Dynamic Datasets Slawomir.
Data Anonymization - Generalization Algorithms Li Xiong CS573 Data Privacy and Anonymity.
Anonymizing Healthcare Data: A Case Study on the Blood Transfusion Service Benjamin C.M. Fung Concordia University Montreal, QC, Canada
Personalized Privacy Preservation Xiaokui Xiao, Yufei Tao City University of Hong Kong.
1 Privacy in Microdata Release Prof. Ravi Sandhu Executive Director and Endowed Chair March 22, © Ravi Sandhu.
PRIVACY AND SECURITY ISSUES IN DATA MINING P.h.D. Candidate: Anna Monreale Supervisors Prof. Dino Pedreschi Dott.ssa Fosca Giannotti University of Pisa.
Anatomy: Simple and Effective Privacy Preservation Xiaokui Xiao, Yufei Tao Chinese University of Hong Kong.
Fast Data Anonymization with Low Information Loss 1 National University of Singapore 2 Hong Kong University
Privacy Preserving Data Publication Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong.
Probabilistic Inference Protection on Anonymized Data
1 On the Anonymization of Sparse High-Dimensional Data 1 National University of Singapore 2 Chinese University of Hong.
C MU U sable P rivacy and S ecurity Laboratory 1 Privacy Policy, Law and Technology Data Privacy October 30, 2008.
Anatomy: Simple and Effective Privacy Preservation Israel Chernyak DB Seminar (winter 2009)
Attacks against K-anonymity
L-Diversity: Privacy Beyond K-Anonymity
Privacy Preserving OLAP Rakesh Agrawal, IBM Almaden Ramakrishnan Srikant, IBM Almaden Dilys Thomas, Stanford University.
The Union-Split Algorithm and Cluster-Based Anonymization of Social Networks Brian Thompson Danfeng Yao Rutgers University Dept. of Computer Science Piscataway,
PRIVACY CRITERIA. Roadmap Privacy in Data mining Mobile privacy (k-e) – anonymity (c-k) – safety Privacy skyline.
Preserving Privacy in Clickstreams Isabelle Stanton.
Anonymization of Set-Valued Data via Top-Down, Local Generalization Yeye He Jeffrey F. Naughton University of Wisconsin-Madison 1.
Database Laboratory Regular Seminar TaeHoon Kim.
Preserving Privacy in Published Data
Privacy and trust in social network
Overview of Privacy Preserving Techniques.  This is a high-level summary of the state-of-the-art privacy preserving techniques and research areas  Focus.
Beyond k-Anonymity Arik Friedman November 2008 Seminar in Databases (236826)
Publishing Microdata with a Robust Privacy Guarantee
Data Publishing against Realistic Adversaries Johannes Gerhrke Cornell University Ithaca, NY Michaela Götz Cornell University Ithaca, NY Ashwin Machanavajjhala.
Differentially Private Data Release for Data Mining Noman Mohammed*, Rui Chen*, Benjamin C. M. Fung*, Philip S. Yu + *Concordia University, Montreal, Canada.
Tuning Privacy-Utility Tradeoffs in Statistical Databases using Policies Ashwin Machanavajjhala cs.duke.edu Collaborators: Daniel Kifer (PSU),
Protecting Sensitive Labels in Social Network Data Anonymization.
Background Knowledge Attack for Generalization based Privacy- Preserving Data Mining.
CS573 Data Privacy and Security Anonymization methods Li Xiong.
Refined privacy models
K-Anonymity & Algorithms
Privacy Preservation of Aggregates in Hidden Databases: Why and How? Arjun Dasgupta, Nan Zhang, Gautam Das, Surajit Chaudhuri Presented by PENG Yu.
Privacy preserving data mining Li Xiong CS573 Data Privacy and Anonymity.
Data Anonymization (1). Outline  Problem  concepts  algorithms on domain generalization hierarchy  Algorithms on numerical data.
Privacy of Correlated Data & Relaxations of Differential Privacy CompSci Instructor: Ashwin Machanavajjhala 1Lecture 16: Fall 12.
The Sparse Vector Technique CompSci Instructor: Ashwin Machanavajjhala 1Lecture 12 : Fall 12.
SECURED OUTSOURCING OF FREQUENT ITEMSET MINING Hana Chih-Hua Tai Dept. of CSIE, National Taipei University.
Hybrid l-Diversity* Mehmet Ercan NergizMuhammed Zahit GökUfuk Özkanlı
Xiaowei Ying, Xintao Wu Univ. of North Carolina at Charlotte PAKDD-09 April 28, Bangkok, Thailand On Link Privacy in Randomizing Social Networks.
Preservation of Proximity Privacy in Publishing Numerical Sensitive Data J. Li, Y. Tao, and X. Xiao SIGMOD 08 Presented by Hongwei Tian.
1 Publishing Naive Bayesian Classifiers: Privacy without Accuracy Loss Author: Barzan Mozafari and Carlo Zaniolo Speaker: Hongwei Tian.
Privacy vs. Utility Xintao Wu University of North Carolina at Charlotte Nov 10, 2008.
MaskIt: Privately Releasing User Context Streams for Personalized Mobile Applications SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference.
Privacy-preserving data publishing
1/3/ A Framework for Privacy- Preserving Cluster Analysis IEEE ISI 2008 Benjamin C. M. Fung Concordia University Canada Lingyu.
Thesis Sumathie Sundaresan Advisor: Dr. Huiping Guo.
CSCI 347, Data Mining Data Anonymization.
Anonymizing Data with Quasi-Sensitive Attribute Values Pu Shi 1, Li Xiong 1, Benjamin C. M. Fung 2 1 Departmen of Mathematics and Computer Science, Emory.
Differential Privacy Xintao Wu Oct 31, Sanitization approaches Input perturbation –Add noise to data –Generalize data Summary statistics –Means,
Data Anonymization - Generalization Algorithms Li Xiong, Slawek Goryczka CS573 Data Privacy and Anonymity.
No Free Lunch in Data Privacy CompSci Instructor: Ashwin Machanavajjhala 1Lecture 15: Fall 12.
Data Mining And Privacy Protection Prepared by: Eng. Hiba Ramadan Supervised by: Dr. Rakan Razouk.
Output Perturbation with Query Relaxation By: XIAO Xiaokui and TAO Yufei Presenter: CUI Yingjie.
Xiaowei Ying, Kai Pan, Xintao Wu, Ling Guo Univ. of North Carolina at Charlotte SNA-KDD June 28, 2009, Paris, France Comparisons of Randomization and K-degree.
Versatile Publishing For Privacy Preservation
ACHIEVING k-ANONYMITY PRIVACY PROTECTION USING GENERALIZATION AND SUPPRESSION International Journal on Uncertainty, Fuzziness and Knowledge-based Systems,
Xiaokui Xiao and Yufei Tao Chinese University of Hong Kong
ADAPTIVE DATA ANONYMIZATION AGAINST INFORMATION FUSION BASED PRIVACY ATTACKS ON ENTERPRISE DATA Srivatsava Ranjit Ganta, Shruthi Prabhakara, Raj Acharya.
Presented by : SaiVenkatanikhil Nimmagadda
TELE3119: Trusted Networks Week 4
Refined privacy models
Presentation transcript:

Personalized Privacy Preservation: beyond k-anonymity and ℓ-diversity SIGMOD 2006 Presented By Hongwei Tian

Outline What is Privacy Breaching? Drawback of K-Anonymity and ℓ-Diversity Personalized Anonymous How Adversary attacks? How Data owner defeats the attacks? Experiments

What is Privacy Breaching Mainly, there are two classes  Compare prior belief and posterior belief prior belief < posterior belief : help adversary 50% --> 80% “Bob has cancer” prior belief > posterior belief : reasonable? 80% --> 50% “Bob has cancer” =>I don’t think so. And many others have the same thought with me.

What is Privacy Breaching  Only posterior belief K-Anonymity: posterior belief ≤ 1/ k  In a QI group, there are at least k tuples. ℓ-Diversity: posterior belief = p ≤ threshold  In a QI group, p percent of the tuples appear in the largest sub-group Personalized: posterior belief = Pr breach ≤ threshold  Pr breach : Breaching probability

Drawback of K-Anonymity and ℓ-Diversity a k-anonymous table only prevents association between individuals and tuples.  ℓ-Diversity and Personalized methods both prevent association between individuals and sensitive values.

Drawback of K-Anonymity and ℓ-Diversity a k-anonymous table may lose considerable information  ℓ-Diversity also has this problem.

Drawback of K-Anonymity and ℓ-Diversity Consider such a situation:  In one QI group, all tuples come from the same individual v  Adversary only knows v in this QI group from external datasets I am Bob and I am unlucky that I have so many diseases Bob must be here Aha, I know Bob has four diseases

Drawback of K-Anonymity and ℓ-Diversity Do not take into account personal anonymity requirements

Personalized Anonymous personalized anonymity: a person can specify the degree of privacy protection for her/his sensitive values. So far, the literature has focused on a universal approach that exerts the same amount of privacy preserving for all persons, without catering for their concrete needs.

Personalized Anonymous

BREACH PROBABILITY:  For a tuple t ∈ T, its breach probability P breach (t) equals the probability that an adversary can infer from T ∗ that any of the associations {o, v 1 },..., {o, v x } exists in T, where v 1,..., v x are the leaf values in SUBTR(t.GN).

Personalized Anonymous BREACH PROBABILITY  Data owner and Adversary both can compute it  Data owner want to P breach (t) < threshold, then the privacy of the individual corresponding to t holds  Adversary hope to get a P breach (t) > threshold, which breaches the privacy of the individual. How the adversary do (attack)? ???

How Adversary attacks Adversary know One Individual One Tuple (Primary Case) Possible reconstruction P(5,4) ×3 ×3=1080; Breaching reconstruction 2 × P(4,3) × 3 ×3=432; Pbreach(t) = 432/1080=2/5

How Adversary attacks Adversary know One Individual Multiple Tuples (Non-Primary Case) Possible reconstruction 5 4 ×3 ×3=5625; Breaching reconstruction 2 × 5 3 × 3 ×3 - 5 2 × 3 ×3 =2025; Pbreach(t) = 2025/5625=9/25

How Data owner defeats the attacks The formal computation for P breach (t).  Primary Case  Non-Primary Case Overlap disjoint n=5,b=2 n=2,b=2, c=1/3

How Data owner defeats the attacks Utility Measure: Information Loss

How Data owner defeats the attacks Algorithm Picture Table Group1 Group N SA-Generalization Split … New Table Replace if having more utility

How Data owner defeats the attacks Algorithm  Start from all QI values are roots and SA values have been generalized to satisfy every P breach (t)< threshold  Top-down split QI attributes, in order to increase utility information  “Single Split” means every time, only one attribute can be split into its direct children  SA-Generalization guarantee in every QI group, every P breach (t)< threshold, then the whole table prevents privacy  Every iteration, it should find a “Split” and after SA-Generalization, the utility information increases; otherwise, it quits

How Data owner defeats the attacks Algorithm  Bottom-up generalize SA values to improve privacy  If tuples in S prob satisfy privacy requirement, means all tuples in G satisfy privacy requirement  All SA values of the tuples which dissatisfy privacy requirement will be generalized to the parent of the Guarding Node which approaches the root mostly  Finish when no tuples in S prob dissatisfy privacy requirement; or no possibility to generalize.

Experiments Adult dataset ( 5 QI Attributes, 1 SA Attribute Pri-leaf, Pri-mixed, Nonpri-leaf, Nonpri-mixed Breaching Threshold =0.25 All Weight for attributes =1

Experiments Breaching Probability

References Xiaokui Xiao and Yufei Tao. Personalized Privacy Preservation. In SIGMOD A. Machanavajjhala, J. Gehrke, and D. Kifer. l-diversity: Privacybeyond k-anonymity. In ICDE, K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Incognito: Efficient full-domain k-anonymity. In SIGMOD, A. Evfimievski, J. Gehrke, and R. Srikant. Limiting privacy breaching in privacy preserving data mining. In ACM Symposium on Principles of Database Systems, 2003.