Fabricio Benevenuto, Gabriel Magno, Tiago Rodrigues, and Virgilio Almeida Universidade Federal de Minas Gerais Belo Horizonte, Brazil ACSAC 2010 Fabricio.

Slides:



Advertisements
Similar presentations
Context-based object-class recognition and retrieval by generalized correlograms by J. Amores, N. Sebe and P. Radeva Discussion led by Qi An Duke University.
Advertisements

Privacy: Facebook, Twitter
Influence and Passivity in Social Media Daniel M. Romero, Wojciech Galuba, Sitaram Asur, and Bernardo A. Huberman Social Computing Lab, HP Labs.
TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
Twitter Glossary. #: People use the hashtag symbol # before a relevant keyword or phrase (no spaces) in their Tweet to categorize those Tweets and help.
Skills: familiarity with the Twitter user interface and major features, using the hashtag (#) and at-sign searching and tweeting images and videos.
PSRC Technology Integration Team TWITTER 101.  Twitter is a social networking tool or microblog.  It is composed of short text, pictures, and URLs called.
A Beginner’s Guide to Social Media Nevada State Board of Nursing September 18-20, 2013 Las Vegas, Nevada.
The Role of Twitter in YouTube Videos Diffusion George Christodoulou EPFL Switzerland Laboratory for Internet Computing Department of Computer Science.
Post, Tweets, Hashtags? Social Media Works!. 2Lions Clubs InternationalPosts, Tweets, Hashtags? Social Media Works! Control Panel.
Twitter The Basics. What is Twitter? Tweets are: 140 characters or less Quick to follow and view updates Used to share links, photos, videos, music,hot.
Introduction to Supervised Machine Learning Concepts PRESENTED BY B. Barla Cambazoglu February 21, 2014.
Hongyu Gao, Tuo Huang, Jun Hu, Jingnan Wang.  Boyd et al. Social Network Sites: Definition, History, and Scholarship. Journal of Computer-Mediated Communication,
Search Engines and Information Retrieval
Presented by Li-Tal Mashiach Learning to Rank: A Machine Learning Approach to Static Ranking Algorithms for Large Data Sets Student Symposium.
Social Media Motion: How to Get Started & Keep Going With Facebook, Twitter & More Presented by Eli Lilly and Company Hosted by Rob Robinson McNeely Pigott.
Twitter: What do so many people have to say? Mary Zedeck Instructional Designer Twitter: Course Resources:
Skills: use common abbreviations, shorten URLs, writing tweets, use #hashtag search Twitter Concepts: application program interface (API),
Skills: familiarity with the Twitter user interface and major features, using the hashtag (#) and at-sign searching and tweeting images and videos.
Towards Boosting Video Popularity via Tag Selection Elizeu Santos-Neto, Tatiana Pontes, Jussara Almeida, Matei Ripeanu University of British Columbia -
TwitterSearch : A Comparison of Microblog Search and Web Search
PSRC Technology Integration Team Twitter 101.  Twitter is a social networking tool or microblog.  It is composed of short text, pictures, and URLs called.
TEACHERS AND Adapted from Mary Harriet Talbut. So What is it?  Micro-blogging Platform  140 character limit in each tweet  Originally started as a.
Web Usage Mining with Semantic Analysis Date: 2013/12/18 Author: Laura Hollink, Peter Mika, Roi Blanco Source: WWW’13 Advisor: Jia-Ling Koh Speaker: Pei-Hao.
Network and Systems Security By, Vigya Sharma (2011MCS2564) FaisalAlam(2011MCS2608) DETECTING SPAMMERS ON SOCIAL NETWORKS.
Suspended Accounts in Retrospect: An Analysis of Twitter Spam Kurt Thomas, Chris Grier, Vern Paxson, Dawn Song University of California, Berkeley International.
Reporter: Li, Fong Ruei National Taiwan University of Science and Technology 9/19/2015Slide 1 (of 32)
A Geographical Characterization of YouTube: a Latin American View Fernando Duarte, Fabrício Benevenuto, Virgílio Almeida, Jussara Almeida Federal University.
Web Cache Replacement Policies: Properties, Limitations and Implications Fabrício Benevenuto, Fernando Duarte, Virgílio Almeida, Jussara Almeida Computer.
Our Twitter Profiles, Our Selves: Predicting Personality with Twitter Daniele Quercia, Michal Kosinski, David Stillwell, Jon Crowcroft COMP4332 Wong Po.
How to best use it. Guidelines & Pitfalls American Conference Institute.
DETECTING SPAMMERS AND CONTENT PROMOTERS IN ONLINE VIDEO SOCIAL NETWORKS Fabrício Benevenuto ∗, Tiago Rodrigues, Virgílio Almeida, Jussara Almeida, and.
Improving Web Spam Classification using Rank-time Features September 25, 2008 TaeSeob,Yun KAIST DATABASE & MULTIMEDIA LAB.
Detecting Semantic Cloaking on the Web Baoning Wu and Brian D. Davison Lehigh University, USA WWW 2006.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Skills: familiarity with the Twitter user interface and major features, the #hashtag and search Concepts: evolution of Twitter applications.
Skills: familiarity with the Twitter user interface and major features, the #hashtag and search Concepts: evolution of Twitter applications.
Improving Cloaking Detection Using Search Query Popularity and Monetizability Kumar Chellapilla and David M Chickering Live Labs, Microsoft.
Date: 2012/4/23 Source: Michael J. Welch. al(WSDM’11) Advisor: Jia-ling, Koh Speaker: Jiun Jia, Chiou Topical semantics of twitter links 1.
Studying Spamming Botnets Using Botlab 台灣科技大學資工所 楊馨豪 2009/10/201 Machine Learning And Bioinformatics Laboratory.
By Gianluca Stringhini, Christopher Kruegel and Giovanni Vigna Presented By Awrad Mohammed Ali 1.
The Tube Over Time: Characterizing Popularity Growth of YouTube Videos ` Abstract In this work, we characterize the growth patterns of video popularity.
Prediction of Influencers from Word Use Chan Shing Hei.
Using Social Media for Fundraising and Communication with Supporters Lindsay Boyle – Communications & Research Coordinator Claire Chapman – Information.
Twitter Games: How Successful Spammers Pick Targets Vasumathi Sridharan, Vaibhav Shankar, Minaxi Gupta School of Informatics and Computing, Indiana University.
Recognizing Stances in Online Debates Unsupervised opinion analysis method for debate-side classification. Mine the web to learn associations that are.
 Who Uses Web Search for What? And How?. Contribution  Combine behavioral observation and demographic features of users  Provide important insight.
We.b : The web of short URLs Demetris Antoniades, lasonas Polakis, Gerogios Kontaxis, Elias Athansapoulos, Sotiris loannidis, Evangelos P.Markatos, Thomas.
Phi.sh/$oCiaL: The Phishing Landscape through Short URLs Sidharth Chhabra *, Anupama Aggarwal †, Fabricio Benevenuto ‡, Ponnurangam Kumaraguru † * Delhi.
Dominique Renault. > Groups Groups - A group can be set up by any user and can be set to private. These are generally used by smaller groups of people.
Don’t Follow me : Spam Detection in Twitter January 12, 2011 In-seok An SNU Internet Database Lab. Alex Hai Wang The Pensylvania State University International.
Spamming Botnets: Signatures and Characteristics Yinglian Xie, Fang Yu, Kannan Achan, Rina Panigrahy, Microsoft Research, Silicon Valley Geoff Hulten,
Internet Safety Blog
It’s not just about what you had for breakfast!.  No other medium has the capability to send and receive information as widely and as quickly as Twitter.
Measuring User Influence in Twitter: The Million Follower Fallacy Meeyoung Cha Hamed Haddadi Fabricio Benevenuto Krishna P. Gummadi.
Digital Communication Report October Facebook Increase of 363 fans. In terms of fans, we had basically the same growth of the previous month. In.
Twitter Part One – The Fundamentals. First things first… What is Twitter? Social networking platform Short messages – 140 characters maximum Relaxed,
Twitter anyone? Sue Newell Chief Operating Officer Faculty of Health and Social Sciences Leeds Metropolitan University.
Opinion spam and Analysis 소프트웨어공학 연구실 G 최효린 1 / 35.
Dec 14, 2014, Harvard University
Gross Niv Analyzing Spammer’s Social Networks for Fun and Profit
Uncovering Social Spammers: Social Honeypots + Machine Learning
The important use of Twitter in the Educators’ World
Elizeu Santos-Neto, Tatiana Pontes, Jussara Almeida, Matei Ripeanu
Twitter 101 Jennifer Weaver, Resource Teacher
Twitter 330 million Montlhy Active Users
Online Tool Screen shots
How to use Twitter By Fraser and Laura.
Pooria Taghizadeh : Dr. Hadi Tabatabaee : Dr. Mona Ghassemian :
Most Effective and Popular Social Networking Site
Presentation transcript:

Fabricio Benevenuto, Gabriel Magno, Tiago Rodrigues, and Virgilio Almeida Universidade Federal de Minas Gerais Belo Horizonte, Brazil ACSAC 2010 Fabricio Benevenuto, Gabriel Magno, Tiago Rodrigues, and Virgilio Almeida Universidade Federal de Minas Gerais Belo Horizonte, Brazil ACSAC 2010 A Presentation at Advanced Defense Lab

Outline Introduction Background Dataset and Labeled Collection Identifying User Attributes Detecting Spammers Related Work Conclusion Advanced Defense Lab2

Introduction Twitter has recently emerged as a popular social system. With a simple interface where only 140 character messages can be posted. These services open opportunities for new forms of spam Advanced Defense Lab3

Introduction 4-step approach Crawled a near-complete dataset from Twitter. Created a labeled collection with users “manually” classified as spammers and non-spammers. Conducted a study about the characteristics of tweet content and user behavior. Used supervised machine learning method to identify spammers. Advanced Defense Lab4

Outline Introduction Background Dataset and Labeled Collection Identifying User Attributes Detecting Spammers Related Work Conclusion Advanced Defense Lab5

Background Relationship links are directional. A re-tweeted message usually starts with Twitter users usually use hashtags (#) to identify certain topics. Trending Topics #musicmonday Advanced Defense Lab6

Background A URL to a website containing advertisements completely unrelated to a hashtag on the tweet Re-tweets in which legitimate links are changed to illegitimate ones. Advanced Defense Lab7

Outline Introduction Background Dataset and Labeled Collection Identifying User Attributes Detecting Spammers Related Work Conclusion Advanced Defense Lab8

Dataset and Labeled Collection We asked Twitter to allow us to collect such data and they white-listed 58 servers located at the MPI-SWS.MPI-SWS Twitter assigns each user a numeric ID which uniquely identifies the user’s profile. We launched our crawler in August 2009 to collect all user IDs ranging from 0 to 80 million. In total 54,981,152 used accounts 1,963,263,821 social links 1,755,925,520 tweets Advanced Defense Lab9

Building a labeled collection Three desired properties that need to be considered to create such collection of users labeled as spammers and non-spammers. The collection needs to have a significant number spammers and non-spammers. The labeled collection needs to include spammers who are aggressive in their strategies and mostly affect the system. The users are chosen randomly and not based on their characteristics. Advanced Defense Lab10

Building a labeled collection Three trending topics The Michael Jackson’s death Susan Boyle’s emergence The hashtag “#musicmonday” Advanced Defense Lab11

Building a labeled collection We developed a website to help volunteers to manually label users as spammers or non-spammers based on their tweets containing #keywords related to the trending topics. In total, 8,207 users were labeled, including 355 spammers and 7,852 non-spammers. We select only 710 of the legitimate users to include in our collection. Advanced Defense Lab12

Outline Introduction Background Dataset and Labeled Collection Identifying User Attributes Detecting Spammers Related Work Conclusion Advanced Defense Lab13

Indentifying User Attributes Content Attributes the maximum, minimum, average, and median of the following metrics: number of hashtags per number of words on each tweet number of URLs per words number of words of each tweet number of characters of each tweet number of URLs on each tweet number of hashtags on each tweet number of numeric characters that appear on the text number of users mentioned on each tweet number of times the tweet has been re-tweeted the fraction of tweets with at least one word from a popular list of spam words the fraction of tweets that are reply messages the fraction of tweets of the user containing URLs Advanced Defense Lab14 39

Identifying User Attributes Total 1065 users. 39% of the spammers posted all their tweets containing spam words, whereas non-spammers typically do not post more than 4% of their tweets containing spam word. Advanced Defense Lab15

Indentifying User Attributes User Behavior Attributes the maximum, minimum, average, and median of the following metrics: the time between tweets number of tweets posted per day number of tweets posted per week number of followers number of followees fraction of followers per followees number of tweets age of the user account number of times the user was mentioned number of times the user was replied to number of times the user replied someone number of followees of the user’s followers number tweets receveid from followees existence of spam words on the user’s screename Advanced Defense Lab16 23

Identifying User Attributes (a) Spammers have a high ratio of followers per follwees. (b) Spammers usually have new accounts probably because they are constantly being blocked by other users and reported to Twitter. (c) non-spammers receive a much large amount of tweets from their followees in comparison with spammers. Advanced Defense Lab17

Outline Introduction Background Dataset and Labeled Collection Identifying User Attributes Detecting Spammers Related Work Conclusion Advanced Defense Lab18

Detecting Spammers SVM-light 5-fold cross-validation. In each test, the original sample is partitioned into 5 sub- samples, out of which four are used as training data, and the remaining one is used for testing. Advanced Defense Lab19

Detecting Spammers Advanced Defense Lab20

Detecting Spammers Advanced Defense Lab21

Detecting Spammers X2 Advanced Defense Lab22

Detecting Spammers Advanced Defense Lab23

Detecting Spammers Advanced Defense Lab24

Detecting Spams Consider the following attributes for each tweet: number of words from a list of spam words number of hashtags per words number of URLs per words number of words number of numeric characters on the text number of characters that are numbers number of URLs number of hashtags number of mentions number of times the tweet has been replied Advanced Defense Lab25

Detecting Spams Advanced Defense Lab26

Detecting Spammers Advanced Defense Lab27

Outline Introduction Background Dataset and Labeled Collection Identifying User Attributes Detecting Spammers Related Work Conclusion Advanced Defense Lab28

Related Work Spam has been observed in various applications, including , web search engines, blogs, videos, and opinions. RE: Each user specifies a list of users who they are willing to receive content from. Advanced Defense Lab29

Outline Introduction Background Dataset and Labeled Collection Identifying User Attributes Detecting Spammers Related Work Conclusion Advanced Defense Lab30

Conclusions Crawled the Twitter site to obtain more than 54 million user profiles. Investigate different tradeoffs for our classification approach and the impact of different attributes sets. Advanced Defense Lab31