Contribution and Proposed Solution Sequence-Based Features Collective Classification with Reports Results of Classification Using Reports Collective Spammer.

Slides:



Advertisements
Similar presentations
Wenke Lee and Nick Feamster Georgia Tech Botnet and Spam Detection in High-Speed Networks.
Advertisements

+ Multi-label Classification using Adaptive Neighborhoods Tanwistha Saha, Huzefa Rangwala and Carlotta Domeniconi Department of Computer Science George.
Oracle Labs Graph Analytics Research Hassan Chafi Sr. Research Manager Oracle Labs Graph-TA 2/21/2014.
Social Media Mining Chapter 5 1 Chapter 5, Community Detection and Mining in Social Media. Lei Tang and Huan Liu, Morgan & Claypool, September, 2010.
Foundations of Comparative Analytics for Uncertainty in Graphs Lise Getoor, University of Maryland Alex Pang, UC Santa Cruz Lisa Singh, Georgetown University.
The influence of search engines on preferential attachment Dan Li CS3150 Spring 2006.
UNDERSTANDING VISIBLE AND LATENT INTERACTIONS IN ONLINE SOCIAL NETWORK Presented by: Nisha Ranga Under guidance of : Prof. Augustin Chaintreau.
CS345 Data Mining Web Spam Detection. Economic considerations  Search has become the default gateway to the web  Very high premium to appear on the.
CSE 574 – Artificial Intelligence II Statistical Relational Learning Instructor: Pedro Domingos.
Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, Slides for Chapter 1:
1 On Compressing Web Graphs Michael Mitzenmacher, Harvard Micah Adler, Univ. of Massachusetts.
Big Ideas in Cmput366. Search Blind Search Iterative deepening Heuristic Search A* Local and Stochastic Search Randomized algorithm Constraint satisfaction.
Design Patterns for Efficient Graph Algorithms in MapReduce Jimmy Lin and Michael Schatz University of Maryland Tuesday, June 29, 2010 This work is licensed.
Tracking Moving Objects in Anonymized Trajectories Nikolay Vyahhi 1, Spiridon Bakiras 2, Panos Kalnis 3, and Gabriel Ghinita 3 1 St. Petersburg State University.
HAS. Patterns The use of patterns is essentially the reuse of well established good ideas. A pattern is a named well understood good solution to a common.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Scalable Text Mining with Sparse Generative Models
Online Stacked Graphical Learning Zhenzhen Kou +, Vitor R. Carvalho *, and William W. Cohen + Machine Learning Department + / Language Technologies Institute.
Combining Supervised and Unsupervised Learning for Zero-Day Malware Detection © 2013 Narus, Inc. Prakash Comar 1 Lei Liu 1 Sabyasachi (Saby) Saha 2 Pang-Ning.
Attention and Event Detection Identifying, attributing and describing spatial bursts Early online identification of attention items in social media Louis.
Modeling, Searching, and Explaining Abnormal Instances in Multi-Relational Networks Chapter 1. Introduction Speaker: Cheng-Te Li
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
Survey on Evolving Graphs Research Speaker: Chenghui Ren Supervisors: Prof. Ben Kao, Prof. David Cheung 1.
NOVA: CONTINUOUS PIG/HADOOP WORKFLOWS. storage & processing scalable file system e.g. HDFS distributed sorting & hashing e.g. Map-Reduce dataflow programming.
Information Flow using Edge Stress Factor Communities Extraction from Graphs Implied by an Instant Messages Corpus Franco Salvetti University of Colorado.
HyPER: A Flexible and Extensible Probabilistic Framework for Hybrid Recommender Systems Pigi Kouki, Shobeir Fakhraei, James Foulds, Magdalini Eirinaki,
Detecting Semantic Cloaking on the Web Baoning Wu and Brian D. Davison Lehigh University, USA WWW 2006.
Web Search. Structure of the Web n The Web is a complex network (graph) of nodes & links that has the appearance of a self-organizing structure  The.
BFTCloud: A Byzantine Fault Tolerance Framework for Voluntary-Resource Cloud Computing Yilei Zhang, Zibin Zheng, and Michael R. Lyu
A Graph-based Friend Recommendation System Using Genetic Algorithm
Joint Models of Disagreement and Stance in Online Debate Dhanya Sridhar, James Foulds, Bert Huang, Lise Getoor, Marilyn Walker University of California,
Center for E-Business Technology Seoul National University Seoul, Korea BrowseRank: letting the web users vote for page importance Yuting Liu, Bin Gao,
One-class Training for Masquerade Detection Ke Wang, Sal Stolfo Columbia University Computer Science IDS Lab.
1 Generative and Discriminative Models Jie Tang Department of Computer Science & Technology Tsinghua University 2012.
+ Collective Spammer Detection in Evolving Multi-Relational Social Networks Shobeir Fakhraei (University of Maryland) James Foulds (University of California,
BotGraph: Large Scale Spamming Botnet Detection Yao Zhao, Yinglian Xie, Fang Yu, Qifa Ke, Yuan Yu, Yan Chen, and Eliot Gillum Speaker: 林佳宜.
Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,
Link Prediction Topics in Data Mining Fall 2015 Bruno Ribeiro
1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.
Data Structures and Algorithms in Parallel Computing Lecture 2.
Network Community Behavior to Infer Human Activities.
Graph Theory. A branch of math in which graphs are used to solve a problem. It is unlike a Cartesian graph that we used throughout our younger years of.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
Weakly Supervised Models of Aspect-Sentiment for Online Course Discussion Forums ARTI RAMESH SHACHI H. KUMAR JAMES FOULDS LISE GETOOR.
Indexing Correlated Probabilistic Databases Bhargav Kanagal, Amol Deshpande University of Maryland, College Park, USA SIGMOD Presented.
Privacy Protection in Social Networks Instructor: Assoc. Prof. Dr. DANG Tran Khanh Present : Bui Tien Duc Lam Van Dai Nguyen Viet Dang.
GRAPHS. Graph Graph terminology: vertex, edge, adjacent, incident, degree, cycle, path, connected component, spanning tree Types of graphs: undirected,
1 MIS in Practice Types of Information Systems (IS)
KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Don’t Follow me : Spam Detection in Twitter January 12, 2011 In-seok An SNU Internet Database Lab. Alex Hai Wang The Pensylvania State University International.
Importance Measures on Nodes Lecture 2 Srinivasan Parthasarathy 1.
Introduction to Graph & Network Theory Thinking About Networks: From Metabolism to the Genome to Social Conflict Summer Workshop for Teachers June 27 th.
Uncovering Social Spammers: Social Honeypots + Machine Learning
By : Namesh Kher Big Data Insights – INFM 750
WEB SPAM.
Data Mining Jim King.
Probabilistic Data Management
Dieudo Mulamba November 2017
Network Science: A Short Introduction i3 Workshop
Centralities (4) Ralucca Gera,
PageRank algorithm based on Eigenvectors
Speaker: Jim-An Tsai Advisor: Professor Jia-ling Koh
Graphs.
Katz Centrality (directed graphs).
Section 8.3: Degree Distribution
Identifying Slow HTTP DoS/DDoS Attacks against Web Servers DEPARTMENT ANDDepartment of Computer Science & Information SPECIALIZATIONTechnology, University.
Degree Distribution Ralucca Gera,
Graphs G = (V,E) V is the vertex set.
Learning to Cluster Faces on an Affinity Graph
Presentation transcript:

Contribution and Proposed Solution Sequence-Based Features Collective Classification with Reports Results of Classification Using Reports Collective Spammer Detection in Evolving Multi-Relational Social Networks Shobeir Fakhraei 1,2, James Foulds 2, Madhusudana Shashanka 3, and Lise Getoor 2 Problem Statement Data Graph Structure Features 1 University of Maryland, College Park, MD, USA 2 University of California, Santa Cruz, CA, USA 3 if(we) Inc. (Currently Niara Inc., CA, USA) We have a time-stamped multi-relational social network with legitimate users and spammers. links = actions at time t (e.g. profile view, message, or poke). Task: Snapshot of the social network + Labels of already identified spammers Find other spammers in the network. Motivation Spam is pervasive in social networks. Traditional approaches don't work well: Spammers can manipulate content-based approaches. E.g., change patterns, split malicious content across messages. Content may not be available due to privacy reasons. Spammers have more ways to interact with users in social networks compared to and the web. A data sample from Tagged.com, including all active users and their activities in a specific timeframe. Tagged is a social network for meeting new people with multiple methods for users to interact. It was founded in 2004 and has over 300 million registered members. Use only the multi-relational meta-data for spammer detection: Graph Structure. Action Sequences. Collectively refine user generated abuse reports. In each relation graph we compute: PageRank: Score for each node based on number and quality of links to it. Degree: Total degree, in-degree, and out- degree of each node. k-Core: Centrality measure via recursive pruning of the least connected vertices. Graph Coloring: Assignment of colors to vertices, where no two adjacent vertices share the same color. Connected Components: Group of vertices with a path between each. Triangle Count: Number of triangles the vertex participates in. Sequential k-gram Features: Short sequence segment of k consecutive actions, to capture the order of events. Mixture of Markov Models: Also called chain- augmented or tree-augmented naive Bayes model to capture longer sequences. HL-MRFs and Probabilistic Soft Logic Hinge-loss Markov random fields (HL-MRFs) are a general class of conditional, continuous probabilistic models. Probabilistic soft logic (PSL) uses a first-order logical syntax as a templating language for HL-MRFs. General rules: Rule satisfaction: Predicates have soft truth values between [0,1] Distance from satisfaction: Most probable explanation (MPE) by optimizing: Graph Structure and Sequence-Based Results Users can report abusive behavior, but the reports contain a lot of noise. Model using only reports: Model using reports and credibility of the reporter: Model using reports, credibility of the reporter, and collective reasoning: Complete framework includes graph structure and sequence features, and three demographic features (i.e., age, gender, and time since registration). We used Graphlab Create for feature extraction and classification with Gradient-Boosted Decision Trees. Precision-Recall ROC