What you want is not what you get: Predicting sharing policies for text-based content on Facebook Arunesh Sinha, Yan Li †, Lujo Bauer *Carnegie Mellon.

What you want is not what you get: Predicting sharing policies for text-based content on Facebook Arunesh Sinha*, Yan Li †, Lujo Bauer* *Carnegie Mellon University † Singapore Management University

Motivation 2

Problem for Social Networks o Report in dailymail.co.uk † 3 † http://www.dailymail.co.uk/sciencetech/article-2423713/Facebook-users-committing-virtual-identity-suicide- quitting-site-droves-privacy-addiction-fears.html http://www.dailymail.co.uk/sciencetech/article-2423713/Facebook-users-committing-virtual-identity-suicide- quitting-site-droves-privacy-addiction-fears.html

More User Control ⇏ Better Privacy o Users fail to comprehend controls o Users fails to comprehend consequences o Though concerned, often no effort towards better use of controls 4

5 Our goal: Help users pick correct policy for new Facebook posts

Facebook Wall Post n+1 Facebook’s Strategy Post n-2 Post n-1 Post n Friends Public Default: Public

Our Goal and Approach Facebook Wall Post n+1 Post n-2 Post n-1 Post n Friends Public Default:? ML

Outline o Data collection methodology o Survey results o Machine learning approach o Results and analysis o Limitations / Conclusion 8

Survey Methodology o Created an online survey o Advertised on Craigslist and at CMU 9 Data Collection Method Participate in a Carnegie Mellon research study on Facebook sharing. Earn $5 for participating in a ~20 minute online study. We’re looking for English speaking adults, who have used Facebook for at least 4 months, update their Facebook status or post on Facebook at least every other day, and have used more than one privacy setting for their posts. Please click on the following link to start the online study: http://greyw1.ece.cmu.edu/survey/survey.php Upon completion of the study, you will receive a $5 Amazon gift card.

Filtering Users Data Collection Method

Survey Questions o Collected demographic data –Age, gender, country, level of education o Degree of agreement with the statements: –I have a strong set of privacy rules. –I find Facebook's privacy controls confusing. o Have you ever posted something on a social network and then regretted doing it? If so, what happened? 11 Data Collection Method

o Fetched 4 months of users’ posts Facebook App 12 Data Collection Method Policy Text in post

Survey Results: Demographics o 42 participants (avg. 146 posts and 4.6 policies) o Age: 18 to 65, with an average of 29.1 o 35 female, 7 male o 39 from USA 13 Survey Results

Survey Results: Sentiment 14 Survey Results

ML Usage Plan Facebook Wall Post n+1 Post n-2 Post n-1 Post n Friends Public Default:? ML

Machine Learning o We use MaxEnt as the ML tool –Used Stanford NLP software o MaxEnt: provides good generalization –I.e., prevents overfitting –Learns probabilistic hypothesis h that outputs probability over labels given data x –Chooses hypothesis h with maximizes entropy Subject to a form of agreement with training data 16 Machine Learning Approach

Features Considered o Words and 2-grams in the Facebook post o Presence of multimedia o Time of day – morning, evening, night o Previous post’s policy o Model (feature set) chosen using cross validation 17 Machine Learning Approach

Temporal Testing o The data is temporal o Picked 10 posts randomly as test data o We simulate a real-world scenario 18 TestTest TestTest Train to predict Machine Learning Approach Time

Training o Cross-validation to choose features o May have different model for different test point 19 Machine Learning Approach TestTest TestTest Train to predict Time

Baseline Approach o Previous policy (Facebook’s approach) –Use the policy of the last post as the prediction o Surprisingly, pretty good accuracy –0.85 on average Results and analysis

MaxEnt Accuracy TechniqueAccuracy Baseline Previous Policy0.85 MaxEnt0.86 Results and analysis

Prediction Mismatch o Problem: We are not predicting intended policy –Instead, predicting implemented policy o Conjecture: –Implemented policy is often incorrect –Users just use Facebook’s default policy Results and analysis

Ground Truth Collection o Feedback on 20 randomly chosen posts –Provides ground truth (intended policy) 23 Results and analysis All policies ever used Text of post

Datasets 24 Original dataClean data Correct 20 posts based on feedback Pruned clean data Remove 80% Implemented Policy Results and analysis

Temporal Testing o 20 intended policy known o Picked 8 of these randomly as test data o We simulate a real-world scenario 25 TestTest TestTest Train to predict Results and analysis

Baseline o Same previous policy approach as before o Measure intended accuracy –Predict only for posts with known intended policy –Better measure of performance o Baseline intended accuracy: 0.67 –0.85 obtained previously on implemented policies Results and analysis

MaxEnt Intended Accuracy 27 Results and analysis Baseline 67% MaxEnt (clean) 71% MaxEnt (pruned clean) 81%

Confidence About Policy 28 Confidence Factor (CF): Fraction of posts for which intended policy matched implemented policy Results and analysis

Analysis of Improvement 29 Results and analysis

Limitations o Only 20 intended policy available o 42 participants is not a huge number –Other studies have used similar numbers o Richer feature space possible –By processing the attachments of the post o Could use more sophisticated ML techniques 30 Limitations

Conclusion o Accuracy: 67% 81% o Accuracy for CF>0.5: 78% 94% 31 An approach demonstrating feasibility of learning intended privacy policy of Facebook posts

Discarding “Bad” Data Helps 32 Result and analysis

Improvement #Participants Result and analysis

What you want is not what you get: Predicting sharing policies for text-based content on Facebook Arunesh Sinha, Yan Li †, Lujo Bauer *Carnegie Mellon.

Similar presentations

Presentation on theme: "What you want is not what you get: Predicting sharing policies for text-based content on Facebook Arunesh Sinha, Yan Li †, Lujo Bauer *Carnegie Mellon."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

What you want is not what you get: Predicting sharing policies for text-based content on Facebook Arunesh Sinha*, Yan Li †, Lujo Bauer* *Carnegie Mellon.

Similar presentations

Presentation on theme: "What you want is not what you get: Predicting sharing policies for text-based content on Facebook Arunesh Sinha*, Yan Li †, Lujo Bauer* *Carnegie Mellon."— Presentation transcript:

Similar presentations

About project

Feedback

What you want is not what you get: Predicting sharing policies for text-based content on Facebook Arunesh Sinha, Yan Li †, Lujo Bauer *Carnegie Mellon.

Presentation on theme: "What you want is not what you get: Predicting sharing policies for text-based content on Facebook Arunesh Sinha, Yan Li †, Lujo Bauer *Carnegie Mellon."— Presentation transcript: