UT DALLAS Erik Jonsson School of Engineering & Computer Science FEARLESS engineering Analyzing and Securing Social Media Security and Privacy in Online.

Slides:



Advertisements
Similar presentations
Operating System Security
Advertisements

21-1 Last time Database Security  Data Inference  Statistical Inference  Controls against Inference Multilevel Security Databases  Separation  Integrity.
UnFriendly: Multi-Party Privacy Risks in Social Networks Kurt Thomas, Chris Grier, David M. Nicol.
Identity Management Based on P3P Authors: Oliver Berthold and Marit Kohntopp P3P = Platform for Privacy Preferences Project.
Building and Analyzing Social Networks Web Data and Semantics in Social Network Applications Dr. Bhavani Thuraisingham February 15, 2013.
Data and Applications Security Developments and Directions Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #21 Privacy March 29, 2005.
1 On Protecting Private Information in Social Networks: A Proposal Bo Luo 1 and Dongwon Lee 2 1 The University of Kansas, 2 The Pennsylvania.
Privacy in Social Networks CSCE 201. Reading Dwyer, Hiltz, Passerini, Trust and privacy concern within social networking sites: A comparison of Facebook.
UTEPComputer Science Dept.1 University of Texas at El Paso Privacy in Statistical Databases Dr. Luc Longpré Computer Science Department Spring 2006.
Search Engines and Information Retrieval
An Authentication Service Based on Trust and Clustering in Wireless Ad Hoc Networks: Description and Security Evaluation Edith C.H. Ngai and Michael R.
An Authentication Service Against Dishonest Users in Mobile Ad Hoc Networks Edith Ngai, Michael R. Lyu, and Roland T. Chin IEEE Aerospace Conference, Big.
April 13, 2010 Towards Publishing Recommendation Data With Predictive Anonymization Chih-Cheng Chang †, Brian Thompson †, Hui Wang ‡, Danfeng Yao † †‡
Bootstrapping Privacy Compliance in Big Data System Shayak Sen, Saikat Guha et al Carnegie Mellon University Microsoft Research Presenter: Cheng Li.
The Union-Split Algorithm and Cluster-Based Anonymization of Social Networks Brian Thompson Danfeng Yao Rutgers University Dept. of Computer Science Piscataway,
12 -1 Lecture 12 User Modeling Topics –Basics –Example User Model –Construction of User Models –Updating of User Models –Applications.
1 Agenda 1. What is (Web) data mining? And what does it have to do with privacy? – a simple view – 2. Examples of data mining and "privacy-preserving data.
Knowledge Management, Semantic Web and
Audumbar Chormale Advisor: Dr. Anupam Joshi M.S. Thesis Defense
UT DALLAS Erik Jonsson School of Engineering & Computer Science FEARLESS engineering Security and Privacy in Social Networks Raymond Heatherly Data Security.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
Differentially Private Transit Data Publication: A Case Study on the Montreal Transportation System Rui Chen, Concordia University Benjamin C. M. Fung,
OMAP: An Implemented Framework for Automatically Aligning OWL Ontologies SWAP, December, 2005 Raphaël Troncy, Umberto Straccia ISTI-CNR
Social Networking and On-Line Communities: Classification and Research Trends Maria Ioannidou, Eugenia Raptotasiou, Ioannis Anagnostopoulos.
Web Usage Mining with Semantic Analysis Date: 2013/12/18 Author: Laura Hollink, Peter Mika, Roi Blanco Source: WWW’13 Advisor: Jia-Ling Koh Speaker: Pei-Hao.
Overview of Privacy Preserving Techniques.  This is a high-level summary of the state-of-the-art privacy preserving techniques and research areas  Focus.
Modeling Relationship Strength in Online Social Networks Rongjing Xiang: Purdue University Jennifer Neville: Purdue University Monica Rogati: LinkedIn.
Preserving Link Privacy in Social Network Based Systems Prateek Mittal University of California, Berkeley Charalampos Papamanthou.
1 Computing with Social Networks on the Web (2008 slide deck) Jennifer Golbeck University of Maryland, College Park Jim Hendler Rensselaer Polytechnic.
Using Transactional Information to Predict Link Strength in Online Social Networks Indika Kahanda and Jennifer Neville Purdue University.
FaceTrust: Assessing the Credibility of Online Personas via Social Networks Michael Sirivianos, Kyungbaek Kim and Xiaowei Yang in collaboration with J.W.
PRIVACY PRESERVING SOCIAL NETWORKING THROUGH DECENTRALIZATION AUTHORS: L.A. CUTILLO, REFIK MOLVA, THORSTEN STRUFE INSTRUCTOR DR. MOHAMMAD ASHIQUR RAHMAN.
Protecting Sensitive Labels in Social Network Data Anonymization.
Background Knowledge Attack for Generalization based Privacy- Preserving Data Mining.
Refined privacy models
Accuracy-Constrained Privacy-Preserving Access Control Mechanism for Relational Data.
Secure Sensor Data/Information Management and Mining Bhavani Thuraisingham The University of Texas at Dallas October 2005.
Data Warehousing Data Mining Privacy. Reading Bhavani Thuraisingham, Murat Kantarcioglu, and Srinivasan Iyer Extended RBAC-design and implementation.
Resisting Structural Re-identification in Anonymized Social Networks Michael Hay, Gerome Miklau, David Jensen, Don Towsley, Philipp Weis University of.
The Matrix: Using Intermediate Features to Classify and Predict Friends in a Social Network Michael Matczynski Status Report April 14, 2006.
A Data-Reachability Model for Elucidating Privacy and Security Risks Related to the Use of Online Social Networks S. Creese, M. Goldsmith, J. Nurse, E.
Privacy vs. Utility Xintao Wu University of North Carolina at Charlotte Nov 10, 2008.
Preventing Private Information Inference Attacks on Social Networks.
Privacy-preserving data publishing
CSCI 347, Data Mining Data Anonymization.
1 Friends and Neighbors on the Web Presentation for Web Information Retrieval Bruno Lepri.
Trustworthy Semantic Web Dr. Bhavani Thuraisingham The University of Texas at Dallas Inference Problem March 4, 2011.
Introduction to Biometrics Dr. Bhavani Thuraisingham The University of Texas at Dallas Introduction to the Course August 22, 2005.
Differential Privacy (1). Outline  Background  Definition.
Information Security Analytics Dr. Bhavani Thuraisingham The University of Texas at Dallas Introduction to the Course.
Database Security Database System Implementation CSE 507 Some slides adapted from Navathe et. Al.
Dr. Bhavani Thuraisingham The University of Texas at Dallas (UTD) November 6, 2015 Cloud-Centric Assured Information Sharing
Xiaowei Ying, Kai Pan, Xintao Wu, Ling Guo Univ. of North Carolina at Charlotte SNA-KDD June 28, 2009, Paris, France Comparisons of Randomization and K-degree.
Presented by Edith Ngai MPhil Term 3 Presentation
Security and Privacy in Social Networks
ACHIEVING k-ANONYMITY PRIVACY PROTECTION USING GENERALIZATION AND SUPPRESSION International Journal on Uncertainty, Fuzziness and Knowledge-based Systems,
Building Trustworthy Semantic Webs
SocialMix: Supporting Privacy-aware Trusted Social Networking Services
Bhavani Thuraisingham
Personalized Privacy Protection in Social Networks
Analyzing and Securing Social Networks
Personalized Privacy Protection in Social Networks
Security and Privacy in Social Networks
Binghui Wang, Le Zhang, Neil Zhenqiang Gong
Building Trustworthy Semantic Webs
Presented by : SaiVenkatanikhil Nimmagadda
Security and Privacy in Social Networks
Trustworthy Semantic Web
Refined privacy models
Differential Privacy (1)
Presentation transcript:

UT DALLAS Erik Jonsson School of Engineering & Computer Science FEARLESS engineering Analyzing and Securing Social Media Security and Privacy in Online Social Networks Murat Kantarcioglu Bhavani Thuraisingham Thanks to Raymond Heatherly and Barbara Carminati for helping in slide preparations

FEARLESS engineering Outline Introduction to Social Networks Properties of Social Networks Social Network Analysis Basics Data Privacy Basics Privacy and Social Networks Access control issues for Online Social Networks

FEARLESS engineering Social Networks Social networks have important implications for our daily lives. –Spread of Information –Spread of Disease –Economics –Marketing Social network analysis could be used for many activities related to information and security informatics. –Terrorist network analysis

FEARLESS engineering Enron Social Graph* *

FEARLESS engineering Romantic Relations at “Jefferson High School”

FEARLESS engineering Emergence of Online Social Networks Online Social networks become increasingly popular. Example: Facebook* –Facebook has more than 200 million active users. –More than 100 million users log on to Facebook at least once each day –More than two-thirds of Facebook users are outside of college –The fastest growing demographic is those 35 years old and older *

FEARLESS engineering Properties of Social Networks “Small-world” phenomenon –Milgram asked participants to pass a letter to one of their close contacts in order to get it to an assigned individual –Most of the letters are lost (~75% of the letters) –The letters who reached their destination have passed through only about six people. –Origins of six degree –Mean geodesic distance l of graphs grows logarithmically or even slower with the network size. (d ij is the shortest distance between node i and j).

FEARLESS engineering “Small-World” Example: Six Degrees of Kevin Bacon

FEARLESS engineering Properties of Social Networks Degree Distribution Clustering Other important properties –Community Structure –Assortativity –Clustering Patterns –Homomiphly –…. Many of these properties could be used for analyzing social networks.

FEARLESS engineering Social Network Mining Social network data is represented a graph –Individuals are represented as nodes Nodes may have attributes to represent personal traits –Relationships are represented as edges Edges may have attributes to represent relationship types Edges may be directed Common Social Network Mining tasks –Node classification –Link Prediction

FEARLESS engineering Data Privacy Basics How to share data without violating privacy? Meaning of privacy? –Identity disclosure –Sensitive Attribute disclosure Current techniques for structured data –K-anonymity –L-diversity –Differential privacy –Secure multi-party computation Problem: Publishing private data while, at the same time, protecting individual privacy Challenges: –How to quantify privacy protection? –How to maximize the usefulness of published data? –How to minimize the risk of disclosure? –…

FEARLESS engineering Sanitization and Anonymization Automated de-identification of private data with certain privacy guarantees –Opposed to “formal determination by statisticians” requirement of HIPAA Two major research directions 1.Perturbation (e.g. random noise addition) 2.Anonymization (e.g. k-anonymization) Removing unique identifiers is not sufficient Quasi-identifier (QI) –Maximal set of attributes that could help identify individuals –Assumed to be publicly available (e.g., voter registration lists) As a process 1.Remove all unique identifiers 2.Identify QI-attributes, model adversary’s background knowledge 3.Enforce some privacy definition (e.g. k-anonymity)

FEARLESS engineering Re-identifying “anonymous” data (Sweeney ’01) 37 US states mandate collection of information She purchased the voter registration list for Cambridge Massachusetts –54,805 people 69% unique on postal code and birth date 87% US-wide with all three Solution: k-anonymity –Any combination of values appears at least k times Developed systems that guarantee k-anonymity –Minimize distortion of results

FEARLESS engineering k-Anonymity Each released record should be indistinguishable from at least (k-1) others on its QI attributes Alternatively: cardinality of any query result on released data should be at least k k-anonymity is (the first) one of many privacy definitions in this line of work –l-diversity, t-closeness, m-invariance, delta-presence... Complementary Release Attack –Different releases can be linked together to compromise k- anonymity. –Solution: Consider all of the released tables before release the new one, and try to avoid linking. Other data holders may release some data that can be used in this kind of attack. Generally, this kind of attack is hard to be prohibited completely.

FEARLESS engineering L-diversity principles L-diversity principle: A q-block is l-diverse if contains at least l ‘well represented” values for the sensitive attribute S. A table is l- diverse if every q-block is l-diverse l-diversity may be difficult and unnecessary to achieve. A single sensitive attribute Two values: HIV positive (1%) and HIV negative (99%) Very different degrees of sensitivity l-diversity is unnecessary to achieve 2-diversity is unnecessary for an equivalence class that contains only negative records l-diversity is difficult to achieve Suppose there are records in total To have distinct 2-diversity, there can be at most 10000*1%=100 equivalence classes

FEARLESS engineering Privacy Preserving Distributed Data Mining Goal of data mining is summary results –A–Association rules –C–Classifiers –C–Clusters The results alone need not violate privacy –C–Contain no individually identifiable values –R–Reflect overall results, not individual organizations The problem is computing the results without access to the data! Data needed for data mining maybe distributed among parties Credit card fraud data Inability to share data due to privacy reasons HIPPAA Even partial results may need to be kept private

FEARLESS engineering Secure Multi-Party Computation (SMC) The goal is computing a function without revealing x i Semi-Honest Model – Parties follow the protocol Malicious Model – Parties may or may not follow the protocol We cannot do better then the existence of the third trusted party situation Generic SMC is too inefficient for PPDDM –Enhancements being explored

UT DALLAS Erik Jonsson School of Engineering & Computer Science FEARLESS engineering Preventing Private Information Inference Attacks on Social Networks Raymond Heatherly, Murat Kantarcioglu, and Bhavani Thuraisingham The University of Texas at Dallas Jack Lindamood Facebook

FEARLESS engineering Graph Model Graph represented by a set of homogenous vertices and a set of homogenous edges Each node also has a set of Details, one of which is considered private. Lindamood et al. 09 & Heatherly et al. 09

FEARLESS engineering Naïve Bayes Classification Classification based only on specified attributes in the node Lindamood et al. 09 & Heatherly et al. 09

FEARLESS engineering Naïve Bayes with Links Rather than calculate the probability from person n x to n y we calculate the probability of a link from n x to a person with n y ‘s traits Lindamood et al. 09 & Heatherly et al. 09

FEARLESS engineering Link Weights Links also have associated weights Represents how ‘close’ a friendship is suspected to be using the following formula: Lindamood et al. 09 & Heatherly et al. 09

FEARLESS engineering Collective Inference Collection of techniques that use node attributes and the link structure to refine classifications. Uses local classifiers to establish a set of priors for each node Uses traditional relational classifiers as the iterative step in classification Lindamood et al. 09 & Heatherly et al. 09

FEARLESS engineering Relational Classifiers Class Distribution Relational Neighbor Weighted-Vote Relational Neighbor Network-only Bayes Classifier Network-only Link-based Classification Lindamood et al. 09 & Heatherly et al. 09

FEARLESS engineering Experimental Data 167,000 profiles from the Facebook online social network Restricted to public profiles in the Dallas/Fort Worth network Over 3 million links Lindamood et al. 09 & Heatherly et al. 09

FEARLESS engineering General Data Properties Diameter of the largest component16 Number of nodes167,390 Number of friendship links3,342,009 Total number of listed traits4,493,436 Total number of unique traits110,407 Number of components18 Probability Liberal.45 Probability Conservative.55 Lindamood et al. 09 & Heatherly et al. 09

FEARLESS engineering Inference Methods Details only: Uses Naïve Bayes classifier to predict attribute Links Only: Uses only the link structure to predict attribute Average: Classifies based on an average of the probabilities computed by Details and Links Lindamood et al. 09 & Heatherly et al. 09

FEARLESS engineering Predicting Private Details Attempt to predict the value of the political affiliation attribute Three Inference Methods used as the local classifier Relaxation labeling used as the Collective Inference method Lindamood et al. 09 & Heatherly et al. 09

FEARLESS engineering Removing Details Ensures that no ‘false’ information is added to the network, all details in the released graph were entered by the user Details that have the highest global probability of indicating political affiliation removed from the network Lindamood et al. 09 & Heatherly et al. 09

FEARLESS engineering Removing Links Ensures that the link structure of the released graph is a subset of the original graph Removes links from each node that are the most like the current node Lindamood et al. 09 & Heatherly et al. 09

FEARLESS engineering Most Liberal Traits Trait NameTrait ValueWeight Liberal Grouplegalize same sex marriage Groupevery time i find out a cute boy is conservative a little part of me dies Groupequal rights for gays Groupthe democratic party Groupnot a bush fan Grouppeople who cannot understand people who voted for bush Groupgovernment religion disaster Lindamood et al. 09 & Heatherly et al. 09

FEARLESS engineering Most Conservative Traits Trait NameTrait ValueWeight Conservative Groupgeorge w bush is my homeboy Groupcollege republicans Grouptexas conservatives Groupbears for bush Groupkerry is a fairy Groupaggie republicans Groupkeep facebook clean Groupi voted for bush Groupprotect marriage one man one woman Lindamood et al. 09 & Heatherly et al. 09

FEARLESS engineering Most Liberal Traits per Trait Name Trait NameTrait ValueWeight Liberal activitiesamnesty international Employerhot topic favorite tv showsqueer as folk grad schoolcomputer science hometownmumbai Relationship Statusin an open relationship religious viewsagnostic looking forwhatever i can get Lindamood et al. 09 & Heatherly et al. 09

FEARLESS engineering Experiments Conducted on 35,000 nodes which recorded political affiliation Tests removing 0 details and 0 links, 10 details and 0 links, 0 details and 10 links, and 10 details and 10 links Varied Training Set size from 10% of available nodes to 90% Lindamood et al. 09 & Heatherly et al. 09

FEARLESS engineering Local Classifier Results Lindamood et al. 09 & Heatherly et al. 09

FEARLESS engineering Collective Inference Results Lindamood et al. 09 & Heatherly et al. 09

FEARLESS engineering Access Control for Social Networks Murat Kantarcioglu

FEARLESS engineering Online Social Networks Access Control Issues Current access control systems for online social networks are either too restrictive or too loose –“selected friends” Bebo, Facebook, and Multiply. –“neighbors” (i.e., the set of users having musical preferences and tastes similar to mine) Last.fm –“friends of friends” (Facebook, Friendster, Orkut); –“contacts of my contacts” (2nd degree contacts), “3rd” and“4th degree contacts” Xing

FEARLESS engineering Challenges I want only my family and close friends to see this picture.

FEARLESS engineering Requirements Many different online social networks with different terminology –Facebook vs Linkedin We need to have flexible models that can represent –User’s profiles –Relationships among users (e.g. Bob is Alice’s close friend) –Resources (e.g., online photo albums) –Relationships among users and resources (e.g., Bob is the owner of the photo album and Alice is tagged in this photo), –Actions (e.g., post a message on someone’s wall).

FEARLESS engineering Overview of the Solution We use semantic web technologies (e.g., OWL) to represent social network knowledge base. We use semantic web rule language (SWRL) to represent various security, admin and filter policies.

FEARLESS engineering Modeling User Profiles and Resources Existing ontologies such as FoAF could be extended to capture user profiles. Relationship among resources could be captured by using OWL concepts –PhotoAlbum rdfs:subClassOf Resource –PhotoAlbum consistsOf Photos

FEARLESS engineering Modeling Relationships Among Users We model relationships among users by defining N-ary relationship –:Christine a :Person ; :has_friend _:Friendship_Relation_1. :_Friendship_relation_1 a :Friendship_Relation ; :Friendship_trust :HIGH; :Friendship_value :Mike. Owl reasoners cannot be used to infer some relationships such as Christine is a third degree friend of John. –Such computations needs to be done separately and represented by using new class.

FEARLESS engineering Specifying Policies Using OSN Knowledge Base Most of the OSN information could be captured using OWL to represent rich set of concepts This makes it possible to specify very flexible access control policies –“Photos could be accessed by friends only” automatically implies closeFriend can access the photos too. –Policies could be defined based on user-resource relationships easily.

FEARLESS engineering Security Policies for OSNs Access control policies Filtering policies –Could be specified by user –Could be specified by authorized user Admin policies –Security admin specifies who is authorized specify filtering and access control policies –Exp: if U1 isParentOf U2 and U2 is a child then U1 can specify filtering policies for U2.

FEARLESS engineering Security Policy Specification (using semantic web technologies) Semantic Web Rule Language (SWRL) is used for specifying access control, filtering and authorization policies. SWRL is based on OWL: –all rules are expressed in terms of OWL concepts (classes, properties, individuals, literals…). Using SWRL, subject, object and actions are specified Rules can have different authorization that states the subject’s rights on target object.

FEARLESS engineering Knowledge based for Authorizations and Prohibitions Authorizations/Prohibitions needs to be specified using OWL –Different object property for each actions supported by OSN. –Authorizations/prohibitions could automatically propagate based on action hierarchies Assume “post” is a subproperty of “write” If a user is given “post” permission than user will have “write” permission as well Admin Prohibitions need to be specified slightly different. (Supervisor, Target, Object, Privilige)

FEARLESS engineering Security Rule Examples SWRL rule specification does depend on the authorization and OSN knowledge bases. –It is not possible to specify generic rules Examples:

FEARLESS engineering Security Rule Enforcement A reference monitor evaluates the requests. Admin request for access control could be evaluated by rule rewriting –Example: Assume Bob submits the following admin request –Rewrite as the following rule

FEARLESS engineering Security Rule Enforcement Admin requests for Prohibitions could be rewritten as well. –Example: Bob issues the following prohibition request –Rewritten version Access control requests needs to consider both filter and access control policies

FEARLESS engineering Framework Architecture Social Network Application Reference Monitor Semantic Web Reasoning Engine Access request Access Decision Policy Store Modified Access request Policy Retrieval Reasoning Result SN Knowledge Base Knowledge Base Queries

FEARLESS engineering Conclusions Various attacks exist to –Identify nodes in anonymized data –Infer private details Recent attempts to increase social network access control to limit some of the attacks Balancing privacy, security and usability on online social networks will be an important challenge Directions –Scalability We are currently implementing such system to test its scalability. –Usability Create techniques to automatically learn rules Create simple user interfaces so that users can easily specify these rules.