Anonymizing Data with Quasi-Sensitive Attribute Values Pu Shi 1, Li Xiong 1, Benjamin C. M. Fung 2 1 Departmen of Mathematics and Computer Science, Emory.

Slides:



Advertisements
Similar presentations
CLOSENESS: A NEW PRIVACY MEASURE FOR DATA PUBLISHING
Advertisements

Publishing Set-Valued Data via Differential Privacy Rui Chen, Concordia University Noman Mohammed, Concordia University Benjamin C. M. Fung, Concordia.
M-Invariance: Towards Privacy Preserving Re-publication of Dynamic Datasets by Tyrone Cadenhead.
Data Anonymization - Generalization Algorithms Li Xiong CS573 Data Privacy and Anonymity.
Personalized Privacy Preservation Xiaokui Xiao, Yufei Tao City University of Hong Kong.
1 Privacy in Microdata Release Prof. Ravi Sandhu Executive Director and Endowed Chair March 22, © Ravi Sandhu.
PRIVACY AND SECURITY ISSUES IN DATA MINING P.h.D. Candidate: Anna Monreale Supervisors Prof. Dino Pedreschi Dott.ssa Fosca Giannotti University of Pisa.
Anatomy: Simple and Effective Privacy Preservation Xiaokui Xiao, Yufei Tao Chinese University of Hong Kong.
An Approach to Evaluate Data Trustworthiness Based on Data Provenance Department of Computer Science Purdue University.
Models and Security Requirements for IDS. Overview The system and attack model Security requirements for IDS –Sensitivity –Detection Analysis methodology.
1 Dr. Xiao Qin Auburn University Spring, 2011 COMP 7370 Advanced Computer and Network Security Generalizing.
Probabilistic Inference Protection on Anonymized Data
K Beyond k-Anonimity: A Decision Theoretic Framework for Assessing Privacy Risk M.Scannapieco, G.Lebanon, M.R.Fouad and E.Bertino.
1 Dr. Xiao Qin Auburn University Spring, 2011 COMP 7370 Advanced Computer and Network Security Generalizing.
Anatomy: Simple and Effective Privacy Preservation Israel Chernyak DB Seminar (winter 2009)
Suppose I learn that Garth has 3 friends. Then I know he must be one of {v 1,v 2,v 3 } in Figure 1 above. If I also learn the degrees of his neighbors,
Attacks against K-anonymity
Malicious parties may employ (a) structure-based or (b) label-based attacks to re-identify users and thus learn sensitive information about their rating.
Protecting Privacy when Disclosing Information Pierangela Samarati Latanya Sweeney.
Supplementary material Figure S1. Cumulative histogram of the fitness of the pairwise alignments of random generated ESSs. In order to assess the statistical.
Anonymizing Tables for Privacy Protection Gagan Aggarwal, Tomás Feder, Krishnaram Kenthapadi, Rajeev Motwani,
PRIVACY CRITERIA. Roadmap Privacy in Data mining Mobile privacy (k-e) – anonymity (c-k) – safety Privacy skyline.
Preserving Privacy in Clickstreams Isabelle Stanton.
Database Laboratory Regular Seminar TaeHoon Kim.
Overview of Privacy Preserving Techniques.  This is a high-level summary of the state-of-the-art privacy preserving techniques and research areas  Focus.
Midterm Review Rao Vemuri 16 Oct Posing a Machine Learning Problem Experience Table – Each row is an instance – Each column is an attribute/feature.
Preserving Link Privacy in Social Network Based Systems Prateek Mittal University of California, Berkeley Charalampos Papamanthou.
Beyond k-Anonymity Arik Friedman November 2008 Seminar in Databases (236826)
Publishing Microdata with a Robust Privacy Guarantee
APPLYING EPSILON-DIFFERENTIAL PRIVATE QUERY LOG RELEASING SCHEME TO DOCUMENT RETRIEVAL Sicong Zhang, Hui Yang, Lisa Singh Georgetown University August.
CS573 Data Privacy and Security Statistical Databases
Data Publishing against Realistic Adversaries Johannes Gerhrke Cornell University Ithaca, NY Michaela Götz Cornell University Ithaca, NY Ashwin Machanavajjhala.
Differentially Private Data Release for Data Mining Noman Mohammed*, Rui Chen*, Benjamin C. M. Fung*, Philip S. Yu + *Concordia University, Montreal, Canada.
Thwarting Passive Privacy Attacks in Collaborative Filtering Rui Chen Min Xie Laks V.S. Lakshmanan HKBU, Hong Kong UBC, Canada UBC, Canada Introduction.
Protecting Sensitive Labels in Social Network Data Anonymization.
Background Knowledge Attack for Generalization based Privacy- Preserving Data Mining.
ECE 8443 – Pattern Recognition Objectives: Error Bounds Complexity Theory PAC Learning PAC Bound Margin Classifiers Resources: D.M.: Simplified PAC-Bayes.
CS573 Data Privacy and Security Anonymization methods Li Xiong.
Refined privacy models
SFU Pushing Sensitive Transactions for Itemset Utility (IEEE ICDM 2008) Presenter: Yabo, Xu Authors: Yabo Xu, Benjam C.M. Fung, Ke Wang, Ada. W.C. Fu,
Accuracy-Constrained Privacy-Preserving Access Control Mechanism for Relational Data.
Privacy Preservation of Aggregates in Hidden Databases: Why and How? Arjun Dasgupta, Nan Zhang, Gautam Das, Surajit Chaudhuri Presented by PENG Yu.
Data Anonymization (1). Outline  Problem  concepts  algorithms on domain generalization hierarchy  Algorithms on numerical data.
Data Anonymization – Introduction and k-anonymity Li Xiong CS573 Data Privacy and Security.
Hybrid l-Diversity* Mehmet Ercan NergizMuhammed Zahit GökUfuk Özkanlı
1 Publishing Naive Bayesian Classifiers: Privacy without Accuracy Loss Author: Barzan Mozafari and Carlo Zaniolo Speaker: Hongwei Tian.
Privacy vs. Utility Xintao Wu University of North Carolina at Charlotte Nov 10, 2008.
Privacy-preserving data publishing
De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer.
Version 1.1 Improving our knowledge of metaheuristic approaches for cell suppression problem Andrea Toniolo Staggemeier Alistair R. Clark James Smith Jonathan.
1/3/ A Framework for Privacy- Preserving Cluster Analysis IEEE ISI 2008 Benjamin C. M. Fung Concordia University Canada Lingyu.
Thesis Sumathie Sundaresan Advisor: Dr. Huiping Guo.
CSCI 347, Data Mining Data Anonymization.
Preserving Privacy GPS Traces via Uncertainty-Aware Path Cloaking Baik Hoh, Marco Gruteser, Hui Xiong, Ansaf Alrabady Presenter:Yao Lu ECE 256, Spring.
The world’s libraries. Connected. Managing your Private and Public Data: Bringing down Inference Attacks against your Privacy Group Meeting in 2015.
Data Anonymization - Generalization Algorithms Li Xiong, Slawek Goryczka CS573 Data Privacy and Anonymity.
Unraveling an old cloak: k-anonymity for location privacy
Privacy-safe Data Sharing. Why Share Data? Hospitals share data with researchers – Learn about disease causes, promising treatments, correlations between.
Personalized Privacy Preservation: beyond k-anonymity and ℓ-diversity SIGMOD 2006 Presented By Hongwei Tian.
ROLE OF ANONYMIZATION FOR DATA PROTECTION Irene Schluender and Murat Sariyar (TMF)
Combinations of SDC methods for continuous microdata Anna Oganian National Institute of Statistical Sciences.
May 2003 SUT Color image segmentation – an innovative approach Amin Fazel May 2003 Sharif University of Technology Course Presentation base on a paper.
Transforming Data to Satisfy Privacy Constraints 컴퓨터교육 전공 032CSE15 최미희.
Privacy Issues in Graph Data Publishing Summer intern: Qing Zhang (from NC State University) Mentors: Graham Cormode and Divesh Srivastava.
Deriving Private Information from Association Rule Mining Results Zutao Zhu, Guan Wang, and Wenliang Du ICDE /3/181.
School of Computer Science & Engineering
ACHIEVING k-ANONYMITY PRIVACY PROTECTION USING GENERALIZATION AND SUPPRESSION International Journal on Uncertainty, Fuzziness and Knowledge-based Systems,
Presented by : SaiVenkatanikhil Nimmagadda
SHUFFLING-SLICING IN DATA MINING
Refined privacy models
Presentation transcript:

Anonymizing Data with Quasi-Sensitive Attribute Values Pu Shi 1, Li Xiong 1, Benjamin C. M. Fung 2 1 Departmen of Mathematics and Computer Science, Emory University, Atlanta, GA, USA 2 CIISE, Concordia University, Montreal, QC, Canana Problem Statement We study the problem of anonymizing microdata with quasi-sensitive (QS) attributes which are not sensitive by themselves, but can be linked to external knowledge to reveal indirect sensitive information of an individual. (a) Original microdata with quasi-sensitive attribute symptoms (b) External knowledge that maps symptoms to disease (c) A generalized table that cannot prevent indirect disclosure of disease through symptoms Figure 1. Anonymizing data with QS attributes Preliminary Results With the Mondrian generalization and our suppression algorithm implemented in C++, we conducted experiments with: 1) a dataset with 3000 tuples augmented from the Adult dataset, with 8 QI attributes and 9 synthesized QS terms per tuple, and 2) an external table with 3000 pieces of knowledge labels linked to random QS terms with Poison distribution. The external knowledge table E has each row as a pair (Li, Si), i = 1, 2,..., |E|, where Li is a sensitive label and S i is a corresponding set of QS values. All sensitive labels that can be linked to the d tuples in a QI group G with quasi-identifying (QI) vector q is ∪ d i=1 K(tp i ), the sensitive label set of G. The attacker’s prior belief α (q,L) and posterior belief β (q,L) are the probabilities that a target tp with QI-vector q is linked to a label L before and after the data release. Definition (QS (c,l)-diversity). A group G satisfies QS (c,l)- diversity if and only if p 1 ≤c (p l + p l p | ∪ di=1K(tpi)| ), where p 1, p 2,..., p | ∪ di=1K(tpi)| are the values of β(q,L i ) in decreasing order. A table D ∗ satisfies QS (c,l)-diversity if every group satisfies QS (c,l)-diversity. Definition (QS t-closeness). A group G satisfies QS t-closeness if and only if the distance between α (q,L) and β (q,L) is no more than a threshold t. A table D ∗ satisfies QS t-closeness if every group satisfies QS t-closeness. Figure 5. Two-phase algorithm for QS t-closeness showing the trade-off between better privacy and smaller removal cost and benefit of the two-phase algorithm compared to generalization only approach. Algorithm Figure 2. Disclosure risks with QS attributes Formal notions of QS l-diversity and QS t-closeness that extend l-diversity and t-closeness to prevent indirect attribute disclosure due to QS attribute values. A two-phase algorithm that combines generalization and value suppression to achieve QS l-diversity and QS t-closeness. Contributions Definitions Phase 1 (QI generalization). Given D, an intermediate dataset Dg is obtained that satisfies k-anonymity. Phase 2 (QS suppression). Given Dg, a suppression algorithm is used to remove proper QS values (items) until every QI group satisfies QS (c,l)-diversity or QS t-closeness. Greedy search heuristics with dynamic reordering of tailsets that contain potential values to be removed in the next step to enable quick return of result Dynamic updates when a solution with a lower cost is found to enable continuous improvement of the result within a bounded time period. Figure 3. QS suppression search tree and algorithm features Figure 4. QS suppression for QS (c,l)-diversity showing adaptive QS suppression outperforms baseline DFS search significantly