Resisting Structural Re-identification in Anonymized Social Networks Michael Hay, Gerome Miklau, David Jensen, Don Towsley, Philipp Weis University of.

Slides:



Advertisements
Similar presentations
Performance in Decentralized Filesharing Networks Theodore Hong Freenet Project.
Advertisements

Complex Networks Advanced Computer Networks: Part1.
Publishing Set-Valued Data via Differential Privacy Rui Chen, Concordia University Noman Mohammed, Concordia University Benjamin C. M. Fung, Concordia.
Wang, Lakshmanan Probabilistic Privacy Analysis of Published Views, IDAR'07 Probabilistic Privacy Analysis of Published Views Hui (Wendy) Wang Laks V.S.
Modeling Malware Spreading Dynamics Michele Garetto (Politecnico di Torino – Italy) Weibo Gong (University of Massachusetts – Amherst – MA) Don Towsley.
Private Analysis of Graph Structure With Vishesh Karwa, Sofya Raskhodnikova and Adam Smith Pennsylvania State University Grigory Yaroslavtsev
Spectrum Based RLA Detection Spectral property : the eigenvector entries for the attacking nodes,, has the normal distribution with mean and variance bounded.
Security and Privacy Issues in Wireless Communication By: Michael Glus, MSEE EEL
Worm Origin Identification Using Random Moonwalks Yinglian Xie, V. Sekar, D. A. Maltz, M. K. Reiter, Hui Zhang 2005 IEEE Symposium on Security and Privacy.
An Approach to Evaluate Data Trustworthiness Based on Data Provenance Department of Computer Science Purdue University.
Xiaowei Ying Xintao Wu Univ. of North Carolina at Charlotte 2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada Graph Generation with Prescribed.
Leting Wu Xiaowei Ying, Xintao Wu Dept. Software and Information Systems Univ. of N.C. – Charlotte Reconstruction from Randomized Graph via Low Rank Approximation.
Detecting Network Intrusions via Sampling : A Game Theoretic Approach Presented By: Matt Vidal Murali Kodialam T.V. Lakshman July 22, 2003 Bell Labs, Lucent.
Small Worlds and the Security of Ubiquitous Computing From : IEEE CNF Author : Harald Vogt Presented by Chen Shih Yu.
Anatomy: Simple and Effective Privacy Preservation Israel Chernyak DB Seminar (winter 2009)
Malicious parties may employ (a) structure-based or (b) label-based attacks to re-identify users and thus learn sensitive information about their rating.
April 13, 2010 Towards Publishing Recommendation Data With Predictive Anonymization Chih-Cheng Chang †, Brian Thompson †, Hui Wang ‡, Danfeng Yao † †‡
Class-based graph anonymization for social network data Shai Peretz DB Seminar (winter 2009)
2. Attacks on Anonymized Social Networks. Setting A social network Edges may be private –E.g., “communication graph” The study of social structure by.
Privacy in Social Networks:
The Union-Split Algorithm and Cluster-Based Anonymization of Social Networks Brian Thompson Danfeng Yao Rutgers University Dept. of Computer Science Piscataway,
The Very Small World of the Well-connected. (19 june 2008 ) Lada Adamic School of Information University of Michigan Ann Arbor, MI
Preserving Privacy in Clickstreams Isabelle Stanton.
Structure based Data De-anonymization of Social Networks and Mobility Traces Shouling Ji, Weiqing Li, and Raheem Beyah Georgia Institute of Technology.
Peer-to-Peer and Social Networks Random Graphs. Random graphs E RDÖS -R ENYI MODEL One of several models … Presents a theory of how social webs are formed.
Privacy and trust in social network
Private Analysis of Graphs
Developing Analytical Framework to Measure Robustness of Peer-to-Peer Networks Niloy Ganguly.
Preserving Link Privacy in Social Network Based Systems Prateek Mittal University of California, Berkeley Charalampos Papamanthou.
Wherefore Art Thou R3579X? Anonymized Social Networks, Hidden Patterns, and Structural Stenography.
Automated Social Hierarchy Detection through Network Analysis (SNAKDD07) Ryan Rowe, Germ´an Creamer, Shlomo Hershkop, Salvatore J Stolfo 1 Advisor:
Protecting Sensitive Labels in Social Network Data Anonymization.
Clustering of protein networks: Graph theory and terminology Scale-free architecture Modularity Robustness Reading: Barabasi and Oltvai 2004, Milo et al.
SFU Pushing Sensitive Transactions for Itemset Utility (IEEE ICDM 2008) Presenter: Yabo, Xu Authors: Yabo Xu, Benjam C.M. Fung, Ke Wang, Ada. W.C. Fu,
Crowds: Anonymity for Web Transactions Michael K. Reiter Aviel D. Rubin Jan 31, 2006Presented by – Munawar Hafiz.
ICOM 6115: Computer Systems Performance Measurement and Evaluation August 11, 2006.
Personalized Social Recommendations – Accurate or Private? A. Machanavajjhala (Yahoo!), with A. Korolova (Stanford), A. Das Sarma (Google) 1.
Xiaowei Ying, Xintao Wu Univ. of North Carolina at Charlotte PAKDD-09 April 28, Bangkok, Thailand On Link Privacy in Randomizing Social Networks.
Chapter 3. Community Detection and Evaluation May 2013 Youn-Hee Han
Xiaowei Ying, Xintao Wu Dept. Software and Information Systems Univ. of N.C. – Charlotte 2008 SIAM Conference on Data Mining, April 25 th Atlanta, Georgia.
Analyzing the Vulnerability of Superpeer Networks Against Attack Niloy Ganguly Department of Computer Science & Engineering Indian Institute of Technology,
Anonymized Social Networks, Hidden Patterns, and Structural Stenography Lars Backstrom, Cynthia Dwork, Jon Kleinberg WWW 2007 – Best Paper.
Lecture 10: Network models CS 765: Complex Networks Slides are modified from Networks: Theory and Application by Lada Adamic.
SybilGuard: Defending Against Sybil Attacks via Social Networks.
On Your Social Network De-anonymizablity: Quantification and Large Scale Evaluation with Seed Knowledge NDSS 2015, Shouling Ji, Georgia Institute of Technology.
Community-enhanced De-anonymization of Online Social Networks Shirin Nilizadeh, Apu Kapadia, Yong-Yeol Ahn Indiana University Bloomington CCS 2014.
1 Epidemic Potential in Human Sexual Networks: Connectivity and The Development of STD Cores.
Privacy Protection in Social Networks Instructor: Assoc. Prof. Dr. DANG Tran Khanh Present : Bui Tien Duc Lam Van Dai Nguyen Viet Dang.
Mix networks with restricted routes PET 2003 Mix Networks with Restricted Routes George Danezis University of Cambridge Computer Laboratory Privacy Enhancing.
Graph Data Management Lab, School of Computer Science Personalized Privacy Protection in Social Networks (VLDB2011)
Privacy Preserving in Social Network Based System PRENTER: YI LIANG.
Yang, et al. Differentially Private Data Publication and Analysis. Tutorial at SIGMOD’12 Part 4: Data Dependent Query Processing Methods Yin “David” Yang.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Sybil Attacks VS Identity Clone Attacks in Online Social Networks Lei Jin, Xuelian Long, Hassan Takabi, James B.D. Joshi School of Information Sciences.
1 Link Privacy in Social Networks Aleksandra Korolova, Rajeev Motwani, Shubha U. Nabar CIKM’08 Advisor: Dr. Koh, JiaLing Speaker: Li, HueiJyun Date: 2009/3/30.
Privacy Vulnerability of Published Anonymous Mobility Traces Chris Y. T. Ma, David K. Y. Yau, Nung Kwan Yip (Purdue University) Nageswara S. V. Rao (Oak.
CS570: Data Mining Spring 2010, TT 1 – 2:15pm Li Xiong.
Privacy Issues in Graph Data Publishing Summer intern: Qing Zhang (from NC State University) Mentors: Graham Cormode and Divesh Srivastava.
Xiaowei Ying, Kai Pan, Xintao Wu, Ling Guo Univ. of North Carolina at Charlotte SNA-KDD June 28, 2009, Paris, France Comparisons of Randomization and K-degree.
HCI problems in computer security Mark Ryan. Electronic voting.
Cohesive Subgraph Computation over Large Graphs
Worm Origin Identification Using Random Moonwalks
Personalized Privacy Protection in Social Networks
Peer-to-Peer and Social Networks Fall 2017
Personalized Privacy Protection in Social Networks
Department of Computer Science University of York
SEG5010 Presentation Zhou Lanjun.
Korea University of Technology and Education
Presented by : SaiVenkatanikhil Nimmagadda
Approximate Graph Mining with Label Costs
Presentation transcript:

Resisting Structural Re-identification in Anonymized Social Networks Michael Hay, Gerome Miklau, David Jensen, Don Towsley, Philipp Weis University of Massachusetts Amherst Session : Privacy & Authentication, VLDB Presented by Yongjin Kwon

Copyright  2011 by CEBT Outline  Introduction  Adversary Knowledge Models Vertex Refinement Queries Subgraph Queries Hub Fingerprint Queries  Disclosure in Real Networks  Anonymity in Random Graphs  Graph Generalization for Anonymization  Conclusion 2

Copyright  2011 by CEBT Introduction  There are a large amount of data in various storages. Supermarket Transactions Web Sever Logs Sensor Data Interactions in Social Networks , Twitter …  Data owners publish sensitive information to facilitate research. Reveal as much important information as possible while preserving the privacy of the individuals in the data. In personal data, analysts may find valuable information. 3

Copyright  2011 by CEBT Introduction (Cont’d)  A Face Is Exposed for AOL Searcher No [New York Times, August 9, 2006] AOL collected 20 million Web search queries and published them. Although the company naïvely anonymized the data, the identity of AOL user “No ” revealed: “Thelma Arnold, a 62-year-old widow who lives in Lilburn, Ga., frequently researches her friends’ medical ailments and loves her three dogs.” Serious problem of privacy risks! 4

Copyright  2011 by CEBT Introduction (Cont’d)  Potential privacy risks in network data Risk network structure in the early epidemic phase of HIV trans- mission in Colorado Springs [Sexually Trans. Infections, 2002] – A social network, which represents a set of individuals related by sexual contacts and shared drug injections, is published in order to analyze how HIV spreads. Enron Dataset ( – The collection was released for investigation. – It is the only “real” collection due to the privacy issues. 5

Copyright  2011 by CEBT Introduction (Cont’d)  Attacks on (naïvely anonymized network data) Wherefore Art Thou R3579X? Anonymized Social Networks, Hidden Patterns, and Structural Steganography [WWW 2007] Active Attack – An adversary chooses a set of targets, creates a small number of fake nodes with edges to these targets, and construct a highly identifiable pattern of links among the new nodes. – After the network is released, the adversary can recognize the pattern and fake nodes, and reveal the sensitive information of targets. Passive Attack – Most vertices in network data usually belong to a small uniquely identifiable subgraph. – An adversary may collude with other friends to identify additional nodes connected to the distinct subset of the coalition. 6

Copyright  2011 by CEBT Introduction (Cont’d)  An adversary may compromise privacy of some victims with some (structural) background knowledge. The naïve anonymization is NOT sufficient! A new way of resisting malicious actions to re-identify the identity of each individual in a published network data must be proposed.  Need to think of… Types of adversary knowledge Theoretical approach of privacy risks A way of preserving privacy while maintaining high utility of data 7

Copyright  2011 by CEBT Adversary Knowledge Models  The adversary’s background knowledge is modeled as “correct” answers to a restricted knowledge query. The adversary uses the query to refine the feasible candidate set.  Three knowledge models Vertex Refinement Queries Subgraph Queries Hub Fingerprint Queries 8

Copyright  2011 by CEBT Vertex Refinement Queries  These queries report on the local structure of the graph around the “target” node. 9 B Degree of B Degrees of neighbors of B

Copyright  2011 by CEBT Vertex Refinement Queries (Cont’d)  Relative Equivalence If the adversary knows the answer of, then G can be quickly re-identified in the anonymized graph! 10 ABC FGH DE

Copyright  2011 by CEBT Subgraph Queries  Two drawbacks of vertex refinement queries Always return “correct” information. Depend on the degree of the target node.  These queries assert the existence of a subgraph around the “target” node. Assume that the adversary knows the number of edge facts around the target node. 11 B BBB 345Edge Facts :

Copyright  2011 by CEBT Hub Fingerprint Queries  A hub is a node with high degree and high betweenness centrality. Hubs are easily re-identified by an adversary.  A hub fingerprint for a node is a vector of distances from observable hub connections. 12 ABC FGH DE Hub Closed World : Not reachable within distance 1 Open World : Incomplete knowledge

Copyright  2011 by CEBT Disclosure in Real Networks  Experiments for the impact of external information Three networked data set – Hep-Th : co-author graphs, taken from the arXiv archive – Enron : “real” dataset, collected by the CALO Project – Net-trace : IP-level network trace collected at a major university Consider each node in turn as a target. Compute the candidate set for the target. – Smaller candidate set : more vulnerable! Characterize how many nodes are protected and how many are re- identifiable. 13

Copyright  2011 by CEBT Disclosure in Real Networks (Cont’d)  Vertex Refinement Queries 14

Copyright  2011 by CEBT Disclosure in Real Networks (Cont’d)  Subgraph Queries Two Strategies to build subgraphs – Sampled Subgraph – Degree Subgraph 15

Copyright  2011 by CEBT Disclosure in Real Networks (Cont’d)  Hub Fingerprint Queries Hub : five highest degree nodes (Enron), ten highest degree nodes (Hep-Th, Net-trace) 16

Copyright  2011 by CEBT Anonymity in Random Graphs  Theoretical approach of privacy risk with random graphs Erdős-Rényi Model (ER Model) with n nodes and edge connection probability p. –  Asymptotic analysis of robustness against knowledge attack Sparse ER Graphs : robust against for any Dense ER Graphs : robust against, but vulnerable against Super-dense ER Graphs : vulnerable against 17

Copyright  2011 by CEBT Anonymity in Random Graphs (Cont’d)  Anonymity Against Subgraph Queries Depends on the number of nodes in the largest clique If for a subgraph query, then The clique number is a useful lower bound on the disclosure.  Random Graphs with Attributes 18

Copyright  2011 by CEBT Graph Generalization for Anonymization  Generalize a naïvely-anonymized graph. Much uncertainty! (measured by the number of possible world) Find the partitioning that maximizes the likelihood while satisfying that the size of a supernode is larger than k. Apply the simulated annealing method to find the partitioning. 19

Copyright  2011 by CEBT Graph Generalization for Anonymization (Cont’d)  How to analyze the generalized graph? Construct the synthetic graph using the tagged information. Perform standard graph analysis on this synthetic graph. 20

Copyright  2011 by CEBT Graph Generalization for Anonymization (Cont’d)  How does graph generalization affect network properties? Examine five properties on the three real-world networks. – Degree – Path Length – Transitivity (Clustering Coefficient) – Network Resilience – Infectiousness Perform the experiments on the 200 synthetic graphs. Repeat for each. 21

Copyright  2011 by CEBT Graph Generalization for Anonymization (Cont’d) 22

Copyright  2011 by CEBT Conclusion  Three contributions Formalize models of adversary knowledge. Provide a start point of theoretical study of privacy risks on a network data. Introduce a new anonymization technique by generalizing the original graph. 23