GANG: Detecting Fraudulent Users in OSNs

GANG: Detecting Fraudulent Users in OSNs
via Guilt-by-Association on Directed Graphs Binghui Wang, Neil Zhenqiang Gong Iowa State University, United States Hao Fu Microsoft Research Asia, China

OUTLINE Background Algorithm Evaluation Conclusion

Online Social Networks (OSNs) are Popular
2.06 billion monthly active users 340 million monthly active users 328 million monthly active users

OSNs Have Many Fraudulent Users

Threats of Fraudulent Users
Fraudulent users can be used to perform various malicious activities Distribute spams and phishing attacks Harvest private user data Influence financial market Disrupt democratic election … Fraudulent user detection is an urgent research problem

Existing Fraudulent-User-Detection Methods
Various methods by multiple research communities networking, security, data mining, etc. Local feature-based methods: Feature extraction + ML classifier Feature: side information (e.g., IP, content, behavior), local structure (e.g., clustering coefficient, common neighbor), etc. Classifier: support vector machine, logistic regression, etc. Fundamental limitation: not adversarially robust Global structure-based methods: Guilt-by-association Leverage graph structure to propagate label information A user is likely to be fraudulent (normal) if it is linked with other fraudulent (normal) users More adversarially robust

Cons of Global Structure-based Methods
Assume symmetric (i.e., undirected) social links Random Walk: SSL (ICML’03, NIPS’04), SybilRank (NSDI’12), SybilWalk (DSN’17), etc. Belief Propagation: SybilBelief (TIFS’14), SybilSCAR (INFOCOM’17), FraudEagle (ICWSM’13), SpEagle (KDD’15), etc. However, real-world OSNs are asymmetric (i.e. directed) Leverage either labeled fraudulent users or normal users Labeled normal: TrustRank (VLDB’04), CatchSync (KDD’14), etc. Labeled fraudulent: DistrustRank (MTW’06), CIA (WWW’12), etc. However, both types of labels exist

Our Contribution: GANG
A novel global structure-based guilt-by-association method on directed graphs Capture unique characteristics of fraudulent-user-detection problem in directed OSNs Leverage both labeled fraudulent users and normal users Convergent and scalable

Problem Definition Input Output Direct social graph Training set
Labeled fraudulent nodes Labeled normal nodes Output Label of each remaining node

Notation Associate a binary r.v. xu with each node u
𝑥 𝑢 =1( 𝑥 𝑢 =−1) : u is fraudulent (normal) Pr( 𝑥 𝑢 =1): probability that u is fraudulent 𝛤 𝑏 𝑢 , 𝛤 𝑖 𝑢 , 𝛤 𝑜 𝑢 : bidirectional, unidirectional incoming, unidirectional outgoing neighbor of u 𝛤 𝑢 = 𝛤 𝑏 𝑢 U 𝛤 𝑖 𝑢 U 𝛤 𝑖 𝑢 : all neighbors of u 𝑥 𝑢 and 𝑥 𝛤 𝑢 : observed labels of u and u’s neighbors

Intuitions (1/3) Intuition I: Bidirectional neighbors v u
𝐽 𝑣𝑢 >0 : coupling strength F F v u N N

Intuitions (2/3) Intuition II: Unidirectional incoming neighbors
Intuition III: Unidirectional outgoing neighbors F ? v u N N N ? v u F F

Intuitions (3/3) Intuition IV: Model prior knowledge about u’s label
Finally, unify neighbor influences and prior knowledge ℎ 𝑢 >0(<0): u is fraudulent (normal) ℎ 𝑢 =0: u is unlabeled

Design of GANG (1/3) Capture intuitions via a pairwise Markov random field (pMRF) A pMRF models the joint probability distribution of all binary r.v.s xu for all nodes 𝑢∈𝑉 via an energy function H Our customized pMRF with the associated energy function

Design of GANG (2/3) Perform inference on pMRF via Loopy Belief Propagation (LBP) First, transform joint probability distribution of pMRF into a product of a set of node potentials and edge potentials Node potential Edge potential Set ℎ𝑜𝑚𝑜𝑝ℎ𝑖𝑙𝑦 𝑠𝑡𝑟𝑒𝑛𝑔ℎ 𝑤 𝑢𝑣 =𝑤>0.5 for all edges Bidirectional edge Unidirectional edge

Design of GANG (3/3) Then, compute posterior probability via LBP’s message-passing Sum-product to update messages Product to obtain the posterior probability

Detect fraudulent users
First, assign prior to all users (0<𝜃≤0.5) Then, obtain posterior probability of u via LBP on customized pMRF Finally, predict unlabeled u to be fraudulent if 𝑝 𝑢 =𝑃𝑟( 𝑥 𝑢 =1)>0.5 and normal, otherwise.

Shortcomings of GANG GANG is not scalable enough, because LBP maintains messages 𝑚 𝑣𝑢 on each edge (v, u) GANG is not guaranteed to converge, because LBP might oscillate on loopy graphs Address shortcomings Eliminate message maintenance Leverage linear approximation

Optimizing GANG (1/2) Eliminate message maintenance

Optimizing GANG Approximate GANG via residual and linearization
Residual variable: Linear approximation: Finally, represent optimized GANG as

Convergence Condition of Optimized GANG
Sufficient convergence condition

Experimental Setups Datasets Compared methods Training set
Large-scale Twitter Large-scale Sina Weibo Compared methods Using undirected graphs: SSL, SybilRank, SybilBelief, SybilSCAR Using directed graphs: TrustRank, DistrustRank, CIA, CatchSync Training set Twitter: randomly sampling 500K users Sina Weibo: randomly sample 1000 users

GANG consistently outperforms compared methods
Detection Accuracy GANG consistently outperforms compared methods

Top-Interval Ranking GANG achieves the best ranking performance
GANG significantly outperforms compared methods

GANG converges on both large-scale OSNs
Convergence GANG converges on both large-scale OSNs

Scalability Optimized GANG is slightly less efficient than random walk-based TrustRank, DistrustRank, and CIA Optimized GANG is one order of magnitude more scalable than the basic GANG

Case Study on Sina Weibo

Conclusion We propose a guilt-by-association method on directed graphs to detect fraudulent users in OSNs We design a customized pMRF to capture unique characteristics in directed graphs and leverage LBP to infer the pMRF We optimize GANG via message elimination & linear approximation Optimized GANG outperforms state-of-the-art methods, guarantees to converge, and is scalable enough

GANG: Detecting Fraudulent Users in OSNs

Similar presentations

Presentation on theme: "GANG: Detecting Fraudulent Users in OSNs"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

GANG: Detecting Fraudulent Users in OSNs

Similar presentations

Presentation on theme: "GANG: Detecting Fraudulent Users in OSNs"— Presentation transcript:

Similar presentations

About project

Feedback