Privacy Issues in Graph Data Publishing Summer intern: Qing Zhang (from NC State University) Mentors: Graham Cormode and Divesh Srivastava.

Privacy Issues in Graph Data Publishing Summer intern: Qing Zhang (from NC State University) Mentors: Graham Cormode and Divesh Srivastava

Outline Privacy in graph data publishing Apply existing microdata anonymization techniques A simple graph anonymization technique Understanding attacks Plan of future work

Microdata publishing Data publishing Macrodata Pre-aggregated statistics (N.R. Adam et al. ACM Computing Surveys, 1989.) Microdata Individual records Concerns in microdata release Privacy of individual tuple Privacy of atomic values (e.g. SSN) Association between tuple ’ s attributes Accuracy of aggregate query answering

Graph data Relationship among entities No sensitive attributes Private information is the association Many graphs of interest are sparse Examples: General graph social network, etc. Who talks to whom Bipartite graph (focus of our work) customer shopping record, etc. Who bought what

Example Graph Data Author IDName A1Andy A2Bob A3Cathy Paper IDTitleConferenceyear P1ASIGMOD2006 P2BSIGMOD2007 P3CVLDB2007 P4DICDE2007 Author IDPaper ID A1P1 A1P2 A2P2 A2P3 A2P4 A3P1 A3P4 A1 A2 A3 P2 P3 P4 P1 Author Paper (author, paper) Association authorpaper

Privacy-preserving microdata sharing Current status Focus on the protection of the association between quasi identifiers and a single sensitive attribute Disease, salary, etc. Related work k-anonymity (Sweeney, International Journal on Uncertainty, Fuzziness and Knowledge-based Systems’ 02) l-diversity (A. Machanavajjhala et al., ICDE’ 06) t-closeness (N. Li et al., ICDE’ 07) (k,e)-anonymity (Q. Zhang, N. Koudas, D. Srivastava, T. Yu, ICDE’ 07)

Related work Attacking anonymized social network (L. Backstrom et al., WWW ’ 07 ) Active attack: insert nodes/links Passive attack: collude and observe graph Privacy risks of public mentions (D. Frankowski et al., SIGIR ’ 07) Link the movie score and movie review databases How to break anonymity of the netflix prize dataset (A. Narayannan et al., UT Austin) Attack through background information How to assemble pieces of a graph privately (K. Frikken et al., WPES ’ 06) Distributed graph construction via multi-party computation

Focus of our work Privacy protection in bipartite graph Protect individual link information across two parties e.g. (author,paper) association Maintain aggregate graph statistics e.g. average number of coauthors, diameter of graph, shortest path distribution, etc. Not considered by previous work

Dataset working on DBLP(conference data only) 402023 distinct authors, 541243 distinct papers 1401349 author-paper pairs most number of papers of one author: 290 most number of authors of one paper: 115 Graph statistics we are looking at 1 st order statistics (node degree) Number of papers of each author Number of authors of each paper 2 nd order statistics Coauthors of each author Copapers of each paper Higher order statistics Walking more steps along the bipartite graph

Outline Privacy in graph data sharing Apply existing microdata anonymization techniques A simple graph anonymization technique Understanding attacks Plan of future work

Anonymization by permutation Publish the (author, paper) relation Permute paper w.r.t. author Global permutation Various partition mechanisms Study the graph statistics 2 nd order statistics studied coauthor times Avg number of papers coauthored by each coauthor pair coauthor distribution Avg number of coauthors of each author copaper times Avg number of authors shared by each copaper pair copaper distribution Avg number of copapers of each paper

coauthor times Source statistics After global permutation Number of papers coauthoredNumber of coauthor pairs 13411970 21377 38

coauthor distribution

More bad news Source distribution The author with the most number of coauthors (363), has 247 publications (7th) the author with the most number of publications (290), has only 44 co-authors Correlation is weak After global permutation The most number of coauthors is 779, The corresponding author has 287 papers (2 nd most). The author with the most number of papers (290) has 722 coauthors. False correlation created!

Other experiments Results are the same for copaper statistics other partitioning mechanisms On authorCt, year, conference, etc.

Observations Permutation of (author, paper) relation guarantees preservation of 1 st order statistics Degrees are just counts Cannot maintain even 2 nd order graph statistics Break the clustering properties Remove links within cluster Introduce fake links among clusters Need other anonymization techniques to maintain graph statistics

Publish tuple-level statistics Publish two tables AuthorDegree(authorID, degree): coAuthor (authorID, coAuthorID) From these two tables, we can get the 1 st -order degree D1 of any author the set of 2 nd -order degree {D2} By joining the two tables We may leak more information! D1 and set of {D2} can serve as signatures to identify entities

Privacy Risk k-identifiable An entity shares the same signature with k-1 other entities 1-identifiable means uniquely identifiable Count the number of authors who have the same signatures maximum k=20015 coming from the authors who has D1=1 and D2=0 (single author – single paper pairs).

Attack simulation-author

Observations Too many entities can be uniquely identified 37 by D1 only, 134426 by {D1, {D2}} In order to protect your privacy You ’ d better not publish any paper Or, just publish one paper, without collaborating with anyone else Because many others are doing the same

Understanding attacks Given an anonymization scheme that preserves statistics, explore attacker ’ s ability What background information is available What strategy to take How much knowledge can he gain What ’ s the cost of attack Starting point: publishing complete statistics Publish complete author sets of each paper, and complete paper sets of each author

Publishing complete statistics Example Author set: {{a1,a4}, {a1,a2}, {a2,a3}, {a3,a4}} Paper set: {p1,p2}, {p2,p3}, {p3,p4}, {p1,p4}} a1 a2 a3 a4 p1 p2 p3 p4

Graph theoretic analysis Can be seen as publishing two isomorphic bipartite graphs Each graph removes labels on one side Bipartite graph isomorphism problem a1 a2 a3 a4 p1 p2 p3 p4

Solution to the problem Hardness of bipartite isomorphism is unknown may exist effective solution for graphs with specific properties Previous n th -order signature can serve as a greedy solution n is bounded by the diameter of the graph More information leakage when background information available node information edge information

Attacker with background information With node information Node a3 is known in previous example {a3, p3}, {a3, p4} is known to the attacker a1 can then be uniquely identified It ’ s the only node with distance 4 to a3 a1 can further help labeling of the isomorphic matching a1 a2 a3 a4 p1 p2 p3 p4a3 a1

Attacker with background information With Edge information Edge {a3,p3} is known {a1, p1} can be recovered By enumerating all possible worlds Disjunctive reasoning It ’ s a finer-grained attack model a1 a2 a3 a4 p1 p2 p3 p4 a3 ’’ a3 ’ a1 ’’ a1 ’

Plan of future work(1) Detailed study of the “ set of all isomorphism ” problem Algorithm and hardness How different background information helps Publish other statistics Binary/triple/ … sets of authors/papers For {a1, a2, a3}, publish {a1, a2}, {a2, a3}, {a1,a3} Maintain more statistics than permutation Maintain more privacy than publishing complete author sets How to evaluate it quantitively

Plan of future work(2) Other possible signatures Shortest path to other nodes Compute pairwise shorted path Sort the vector as signature Other datasets IMDB data

Thanks!

Privacy Issues in Graph Data Publishing Summer intern: Qing Zhang (from NC State University) Mentors: Graham Cormode and Divesh Srivastava.

Similar presentations

Presentation on theme: "Privacy Issues in Graph Data Publishing Summer intern: Qing Zhang (from NC State University) Mentors: Graham Cormode and Divesh Srivastava."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Privacy Issues in Graph Data Publishing Summer intern: Qing Zhang (from NC State University) Mentors: Graham Cormode and Divesh Srivastava.

Similar presentations

Presentation on theme: "Privacy Issues in Graph Data Publishing Summer intern: Qing Zhang (from NC State University) Mentors: Graham Cormode and Divesh Srivastava."— Presentation transcript:

Similar presentations

About project

Feedback