What you want is not what you get: Predicting sharing policies for text-based content on Facebook Arunesh Sinha*, Yan Li †, Lujo Bauer* *Carnegie Mellon.

Slides:



Advertisements
Similar presentations
Large-Scale Entity-Based Online Social Network Profile Linkage.
Advertisements

Survey of attitudes toward Online Classes and Use of Social Networking websites The 3 (arbitrarily contrived) hypotheses: People 0-29 years old will be.
UnFriendly: Multi-Party Privacy Risks in Social Networks Kurt Thomas, Chris Grier, David M. Nicol.
Machine Learning in Practice Lecture 3 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Social Networks: Facebook “Facebook Grows Up” by Steven Levy “Information Revelation and Privacy in Online Social Networks (The Facebook Case)” by Ralph.
A comparison of the blogging practices of UK and US bloggers Dr Sarah Pedersen.
Decision Tree Rong Jin. Determine Milage Per Gallon.
Three kinds of learning
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Basic Business Statistics.
Data Mining: A Closer Look Chapter Data Mining Strategies (p35) Moh!
Distributed Computing Group Cluestr: Mobile Social Networking for Enhanced Group Communication Reto Grob (Swisscom) Michael Kuhn (ETH Zurich) Roger Wattenhofer.
Sparse vs. Ensemble Approaches to Supervised Learning
Twitter Volume Spikes: Analysis and Application in Stock Trading Yuexin Mao, Wei Wei and Bing Wang COMP4332/RMBI4310 CHAN Chun Ting ( )
Large-Scale Cost-sensitive Online Social Network Profile Linkage.
A Deeper Understanding of Avery Fitness Center Customers
1 © Goharian & Grossman 2003 Introduction to Data Mining (CS 422) Fall 2010.
WEB FORUM MINING BASED ON USER SATISFACTION PAGE 1 WEB FORUM MINING BASED ON USER SATISFACTION By: Suresh Pokharel Information and Communications Technologies.
Prepared by Poker Players Research Ltd. Methodology for Spring 2010 Wave Poker Players Research Limited.
Slide Image Retrieval: A Preliminary Study Guo Min Liew and Min-Yen Kan National University of Singapore Web IR / NLP Group (WING)
Lecture Notes 4 Pruning Zhangxi Lin ISQS
EXTRACT: MINING SOCIAL FEATURES FROM WLAN TRACES: A GENDER-BASED CASE STUDY By Udayan Kumar Ahmed Helmy University of Florida Presented by Ahmed Alghamdi.
Network and Systems Security By, Vigya Sharma (2011MCS2564) FaisalAlam(2011MCS2608) DETECTING SPAMMERS ON SOCIAL NETWORKS.
Using Transactional Information to Predict Link Strength in Online Social Networks Indika Kahanda and Jennifer Neville Purdue University.
A Framework for User Modeling in QuizMASter Athabasca University Sima Shabani October 2012.
WALKING IN FACEBOOK: A CASE STUDY OF UNBIASED SAMPLING OF OSNS junction.
RecSys 2011 Review Qi Zhao Outline Overview Sessions – Algorithms – Recommenders and the Social Web – Multi-dimensional Recommendation, Context-
CHARACTERIZATION OF USER BEHAVIOR IN SOCIAL NETWORKS TO BETTER UNDERSTAND CYBERBULLYING Homa Hosseinmardi Department of Computer Science University of.
Marketing To Grow Your Business - Public Relations.
ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.
Feature selection LING 572 Fei Xia Week 4: 1/29/08 1.
Experimental Evaluation of Learning Algorithms Part 1.
Categorical data. Decision Tree Classification Which feature to split on? Try to classify as many as possible with each split (This is a good split)
Partially Supervised Classification of Text Documents by Bing Liu, Philip Yu, and Xiaoli Li Presented by: Rick Knowles 7 April 2005.
An Investigation of Facebook Grouping Robin Brewer Yael Mayer Lorrie Cranor Patrick Kelley facebook Home Profile Account Search.
Government IT Professionals Online Survey Results FINAL REPORT September 2010.
Adaptive Information-Sharing for Privacy-Aware Mobile Social Network Igor Bilogrevic 1, Kévin Huguenin 1, Berker Agir 1, Murtuza Jadliwala 2 and Jean-Pierre.
EXPLOITING DYNAMIC VALIDATION FOR DOCUMENT LAYOUT CLASSIFICATION DURING METADATA EXTRACTION Kurt Maly Steven Zeil Mohammad Zubair WWW/Internet 2007 Vila.
EDTECH Module 7 Technology Survey by J.D. Winterhalter.
Social Media: The New Note Home Does Age Effect Responsiveness and acceptance to Social Media? By: David Yarbrough EDTC 5130.
Paired Sampling in Density-Sensitive Active Learning Pinar Donmez joint work with Jaime G. Carbonell Language Technologies Institute School of Computer.
ISCG8025 Machine Learning for Intelligent Data and Information Processing Week 3 Practical Notes Application Advice *Courtesy of Associate Professor Andrew.
Neural Network Implementation of Poker AI
CoCQA : Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation Baoli Li, Yandong Liu, and Eugene Agichtein.
USE RECIPE INGREDIENTS TO PREDICT THE CATEGORY OF CUISINE Group 7 – MEI, Yan & HUANG, Chenyu.
Date: 2015/11/19 Author: Reza Zafarani, Huan Liu Source: CIKM '15
COP5992 – DATA MINING TERM PROJECT RANDOM SUBSPACE METHOD + CO-TRAINING by SELIM KALAYCI.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Click to Add Title A Systematic Framework for Sentiment Identification by Modeling User Social Effects Kunpeng Zhang Assistant Professor Department of.
PubMed/Limits and Advanced Search (module 4.2). MODULE 4.2 PubMed/Limits & Advanced Search Instructions - This part of the:  course is a PowerPoint demonstration.
Performance Comparison of Speaker and Emotion Recognition
© Devi Parikh 2008 Devi Parikh and Tsuhan Chen Carnegie Mellon University April 3, ICASSP 2008 Bringing Diverse Classifiers to Common Grounds: dtransform.
QUANTITATIVE RESEARCH ON USING SOCIAL NETWORKING SITE AS ONLINE COMMUNITY FOR HEALTH PURPOSES Marika Apostolova Trpkovska Betim Cico 1.
Randomized Assignment Difference-in-Differences
Real Name Verification Law on the Internet: A Poison or Cure for Privacy? Daegon Cho Heinz College, Carnegie Mellon University June 15th WEIS 2011 at George.
2-Day Introduction to Agent-Based Modelling Day 2: Session 7 Social Science, Different Purposes and Changing Networks.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Feasibility of Using Machine Learning Algorithms to Determine Future Price Points of Stocks By: Alexander Dumont.
Collaborative Deep Learning for Recommender Systems
Automated Experiments on Ad Privacy Settings
Machine Learning overview Chapter 18, 21
A Network Science Approach to Fake News Detection on Social Media
Object Detection with Bootstrapping Carlos Rubiano Mentor: Oliver Nina
Issues in Decision-Tree Learning Avoiding overfitting through pruning
Location Recommendation — for Out-of-Town Users in Location-Based Social Network Yina Meng.
Gerd Kortemeyer, William F. Punch
iSRD Spam Review Detection with Imbalanced Data Distributions
Approaching an ML Problem
Privacy Protection for Social Network Services
Evaluating Classifiers
“The Spread of Physical Activity Through Social Networks”
Presentation transcript:

What you want is not what you get: Predicting sharing policies for text-based content on Facebook Arunesh Sinha*, Yan Li †, Lujo Bauer* *Carnegie Mellon University † Singapore Management University

Motivation 2

Problem for Social Networks o Report in dailymail.co.uk † 3 † quitting-site-droves-privacy-addiction-fears.html quitting-site-droves-privacy-addiction-fears.html

More User Control ⇏ Better Privacy o Users fail to comprehend controls o Users fails to comprehend consequences o Though concerned, often no effort towards better use of controls 4

5 Our goal: Help users pick correct policy for new Facebook posts

Facebook Wall Post n+1 Facebook’s Strategy Post n-2 Post n-1 Post n Friends Public Default: Public

Our Goal and Approach Facebook Wall Post n+1 Post n-2 Post n-1 Post n Friends Public Default:? ML

Outline o Data collection methodology o Survey results o Machine learning approach o Results and analysis o Limitations / Conclusion 8

Survey Methodology o Created an online survey o Advertised on Craigslist and at CMU 9 Data Collection Method Participate in a Carnegie Mellon research study on Facebook sharing. Earn $5 for participating in a ~20 minute online study. We’re looking for English speaking adults, who have used Facebook for at least 4 months, update their Facebook status or post on Facebook at least every other day, and have used more than one privacy setting for their posts. Please click on the following link to start the online study: Upon completion of the study, you will receive a $5 Amazon gift card.

Filtering Users Data Collection Method

Survey Questions o Collected demographic data –Age, gender, country, level of education o Degree of agreement with the statements: –I have a strong set of privacy rules. –I find Facebook's privacy controls confusing. o Have you ever posted something on a social network and then regretted doing it? If so, what happened? 11 Data Collection Method

o Fetched 4 months of users’ posts Facebook App 12 Data Collection Method Policy Text in post

Survey Results: Demographics o 42 participants (avg. 146 posts and 4.6 policies) o Age: 18 to 65, with an average of 29.1 o 35 female, 7 male o 39 from USA 13 Survey Results

Survey Results: Sentiment 14 Survey Results

ML Usage Plan Facebook Wall Post n+1 Post n-2 Post n-1 Post n Friends Public Default:? ML

Machine Learning o We use MaxEnt as the ML tool –Used Stanford NLP software o MaxEnt: provides good generalization –I.e., prevents overfitting –Learns probabilistic hypothesis h that outputs probability over labels given data x –Chooses hypothesis h with maximizes entropy Subject to a form of agreement with training data 16 Machine Learning Approach

Features Considered o Words and 2-grams in the Facebook post o Presence of multimedia o Time of day – morning, evening, night o Previous post’s policy o Model (feature set) chosen using cross validation 17 Machine Learning Approach

Temporal Testing o The data is temporal o Picked 10 posts randomly as test data o We simulate a real-world scenario 18 TestTest TestTest Train to predict Machine Learning Approach Time

Training o Cross-validation to choose features o May have different model for different test point 19 Machine Learning Approach TestTest TestTest Train to predict Time

Baseline Approach o Previous policy (Facebook’s approach) –Use the policy of the last post as the prediction o Surprisingly, pretty good accuracy –0.85 on average Results and analysis

MaxEnt Accuracy TechniqueAccuracy Baseline Previous Policy0.85 MaxEnt0.86 Results and analysis

Prediction Mismatch o Problem: We are not predicting intended policy –Instead, predicting implemented policy o Conjecture: –Implemented policy is often incorrect –Users just use Facebook’s default policy Results and analysis

Ground Truth Collection o Feedback on 20 randomly chosen posts –Provides ground truth (intended policy) 23 Results and analysis All policies ever used Text of post

Datasets 24 Original dataClean data Correct 20 posts based on feedback Pruned clean data Remove 80% Implemented Policy Results and analysis

Temporal Testing o 20 intended policy known o Picked 8 of these randomly as test data o We simulate a real-world scenario 25 TestTest TestTest Train to predict Results and analysis

Baseline o Same previous policy approach as before o Measure intended accuracy –Predict only for posts with known intended policy –Better measure of performance o Baseline intended accuracy: 0.67 –0.85 obtained previously on implemented policies Results and analysis

MaxEnt Intended Accuracy 27 Results and analysis Baseline 67% MaxEnt (clean) 71% MaxEnt (pruned clean) 81%

Confidence About Policy 28 Confidence Factor (CF): Fraction of posts for which intended policy matched implemented policy Results and analysis

Analysis of Improvement 29 Results and analysis

Limitations o Only 20 intended policy available o 42 participants is not a huge number –Other studies have used similar numbers o Richer feature space possible –By processing the attachments of the post o Could use more sophisticated ML techniques 30 Limitations

Conclusion o Accuracy: 67% 81% o Accuracy for CF>0.5: 78% 94% 31 An approach demonstrating feasibility of learning intended privacy policy of Facebook posts

Discarding “Bad” Data Helps 32 Result and analysis

Improvement #Participants Result and analysis