Presentation is loading. Please wait.

Presentation is loading. Please wait.

DETECTING SPAMMERS AND CONTENT PROMOTERS IN ONLINE VIDEO SOCIAL NETWORKS Fabrício Benevenuto ∗, Tiago Rodrigues, Virgílio Almeida, Jussara Almeida, and.

Similar presentations


Presentation on theme: "DETECTING SPAMMERS AND CONTENT PROMOTERS IN ONLINE VIDEO SOCIAL NETWORKS Fabrício Benevenuto ∗, Tiago Rodrigues, Virgílio Almeida, Jussara Almeida, and."— Presentation transcript:

1 DETECTING SPAMMERS AND CONTENT PROMOTERS IN ONLINE VIDEO SOCIAL NETWORKS Fabrício Benevenuto ∗, Tiago Rodrigues, Virgílio Almeida, Jussara Almeida, and Marcos Gonçalves Computer Science Department, Federal University of Minas Gerais Belo Horizonte, Brazil (SIR’09) Speaker : Yi-Ling Tai Date : 2009/09/28

2 OUTLINE Introduction User test collection Analyzing user behavior attributes Detecting spammer and promoters Evaluation metrics Experimental setup Classification Reducing attribute set Conclusions

3 INTRODUCTION YouTube is the most popular Online video social networks. It allows users to post a video as a response to a discussion topic. These features open opportunities for users to introduce polluted content into the system. Pollution – spread advertise to generate sales disseminate pornography compromise system reputation

4 INTRODUCTION Users cannot easily identify the pollution before watching, which also consumes system resources, especially bandwidth. This paper address the issue of detecting video spammers and promoters. Spammers post an unrelated video as response to a popular video topic to increase the likelihood of the response being viewed. Promoters post a large number of responses to boost the rank of the video topic.

5 INTRODUCTION Toward this end - crawl a large user data set from YouTube. “manually” classified user as legitimate, spammers and promoters. study attributes to distinguish different types of polluters. use a supervised classification algorithm to detect spammers and promoters.

6 USER TEST COLLECTION A YouTube video is a responded video or a video topic if it has at least one video response. A YouTube user is a responsive user if she has posted at least one video response. A responded user is someone who posted at least one responded video. Polluter is used to refer to either a spammer or a promoter.

7 CRAWLING YOUTUBE User interactions can be represented by a video response user graph. G = ( X, Y ) X is the union of all users who posted or received video responses. ( x 1, x 2 ) is a directed arc in Y, if user x 1 has responded to a video contributed by user x 2 To build the graph, this paper build a crawler that implements Algorithm 1.

8 CRAWLING YOUTUBE The sampling starts from a set of 88 seeds, consisting of the owners of the top-100 most responded videos of all time. The crawler follows links gathering information on a number of different attributes. 264,460 users 381,616 responded videos 701,950 video responses

9 CRAWLING YOUTUBE

10 BUILDING A TEST COLLECTION The main goal is to study the patterns and characteristics of each class of users. The collection should include the properties having a significant number of users of all three categories including, but not restricting to large amounts of pollution including a large number of legitimate users with different behavior randomly sampling may not achieve these properties.

11 BUILDING A TEST COLLECTION three strategies for user selection 1. different levels of interaction Four groups of users based on their in and out-degrees 100 users were randomly selected from each group 2. Aiming at the test collection with polluters Browsed responses of top 100 most responded videos, selecting suspect users. 3. randomly selected 300 users Who posted video responses to the top 100 most responded videos To minimize a possible bias by strategy2

12 BUILDING A TEST COLLECTION Each selected user was then manually classified. Three volunteers analyzed all video responses of each user to classify her into one of categories. Volunteers were instructed to favor legitimate users.

13 ANALYZING USER BEHAVIOR ATTRIBUTES We considered three attribute sets Video attributes Duration, number of views, commentaries received Rating, number of times to be selected favorite Number of honor and external links Three video groups of the user All video uploaded by the user Video responses Responded videos which this user response to summing up 42 video attributes for each user

14 ANALYZING USER BEHAVIOR ATTRIBUTES User attributes number of friends, number of videos uploaded, number of videos watched, number of videos added as favorite, numbers of video responses posted and received, numbers of subscriptions and subscribers, average time between video uploads, maximum number of videos uploaded in 24 hours.

15 ANALYZING USER BEHAVIOR ATTRIBUTES Social network attributes Clustering coefficient cc(i), is the ratio of the number of existing edges between i’s neighbors to the maximum possible number Betweenness Reciprocity Assortativity The ratio between the node (in/out) degree and the average (in/out)degree of its neighbors. UserRank

16 ANALYZING USER BEHAVIOR ATTRIBUTES two well known feature selection methods. Information gain (Chi Squared)

17 ANALYZING USER BEHAVIOR ATTRIBUTES

18 EVALUATION METRICS use the standard information retrieval metrics Recall Precision Micro-F1 first computing global precision and recall values for all classes. then calculating F1 Macro-F1 first calculating F1 values for each class in isolation then averaging over all classes

19 EVALUATION METRICS confusion matrix

20 EXPERIMENTAL SETUP libSVM - an open source SVM package allows searching for the best classifier parameters using the training data provides a series of optimizations, including normalization of all numerical attributes. 5-fold cross-validation. repeated 5 times with different seeds used to shuffle the original data set. producing 25 different results for each test.

21 TWO CLASSIFICATION STRATEGIES flat classification promoters (P), spammers (S), and legitimate users (L) hierarchical strategy first separate promoters (P) from non-promoters (NP) heavy (HP) and light promoters (LP) legitimate users (L) and spammers (S)

22 FLAT CLASSIFICATION confusion matrix obtained The numbers presented are percentages relative to the total number of users in each class. The diagonal indicates the recall in each class. no promoter was classified as legitimate user. 3.87% - videos actually acquired popularity. harder to distinguish them from spammers.

23 FLAT CLASSIFICATION 41.91% - Legitimate users post their video responses to popular responded videos(a typical behavior of spammers). Micro-F1 = 87.5, with per-class F1 values are 90.8, 63.7, and 92.3 Macro-F1 = 82.2

24 HIERARCHICAL CLASSIfiCATION Binary classification J parameter - one can give priority to one class (e.g., spammers) over the other (e.g., legitimate users). promoters VS non-promoters Macro-F1 = 93.44 Micro-F1 = 99.17

25 NON-PROMOTERS We trained the classifier with the original training data without promoters. with J=1 J = 0.1 - 24% VS 1% J = 3.0 - 71% VS 9% The best solution depends on the system administrator’s objectives.

26 HEAVY AND LIGHT PROMOTERS Aggressiveness Maximum number of video responses posted in a 24- hour period. k-means clustering algorithm was used to separate promoters into two clusters. Average aggressiveness Light promoters = 15.78 (CV=0.63) Heavy promoters = 107.54 (CV=0.61)

27 HEAVY AND LIGHT PROMOTERS Binary classification retrained with the original training data containing only promoters

28 REDUCING THE ATTRIBUTE SET Two scenarios - Decreasing order of position in the χ2 ranking Evaluating classification when subsets of 10 attributes occupying contiguous positions

29 CONCLUSIONS An effective solution to detect spammers and promoters in online video social networks. Flat classification approach provides alternative to simply considering all users as legitimate. Hierarchical approach explores different classification tradeoffs and provides more flexibility for the application. Finally, we can produce significant benefits with only a small subset of less expensive attributes. Spammers and promoters will evolve and adapt to anti-pollution strategies, periodical assessment of the classification process may be necessary.


Download ppt "DETECTING SPAMMERS AND CONTENT PROMOTERS IN ONLINE VIDEO SOCIAL NETWORKS Fabrício Benevenuto ∗, Tiago Rodrigues, Virgílio Almeida, Jussara Almeida, and."

Similar presentations


Ads by Google