Presentation is loading. Please wait.

Presentation is loading. Please wait.

GANG: Detecting Fraudulent Users in OSNs

Similar presentations


Presentation on theme: "GANG: Detecting Fraudulent Users in OSNs"— Presentation transcript:

1 GANG: Detecting Fraudulent Users in OSNs
via Guilt-by-Association on Directed Graphs Binghui Wang, Neil Zhenqiang Gong Iowa State University, United States Hao Fu Microsoft Research Asia, China

2 OUTLINE Background Algorithm Evaluation Conclusion

3 OUTLINE Background Algorithm Evaluation Conclusion

4 Online Social Networks (OSNs) are Popular
2.06 billion monthly active users 340 million monthly active users 328 million monthly active users

5 OSNs Have Many Fraudulent Users

6 Threats of Fraudulent Users
Fraudulent users can be used to perform various malicious activities Distribute spams and phishing attacks Harvest private user data Influence financial market Disrupt democratic election Fraudulent user detection is an urgent research problem

7 Existing Fraudulent-User-Detection Methods
Various methods by multiple research communities networking, security, data mining, etc. Local feature-based methods: Feature extraction + ML classifier Feature: side information (e.g., IP, content, behavior), local structure (e.g., clustering coefficient, common neighbor), etc. Classifier: support vector machine, logistic regression, etc. Fundamental limitation: not adversarially robust Global structure-based methods: Guilt-by-association Leverage graph structure to propagate label information A user is likely to be fraudulent (normal) if it is linked with other fraudulent (normal) users More adversarially robust

8 Cons of Global Structure-based Methods
Assume symmetric (i.e., undirected) social links Random Walk: SSL (ICML’03, NIPS’04), SybilRank (NSDI’12), SybilWalk (DSN’17), etc. Belief Propagation: SybilBelief (TIFS’14), SybilSCAR (INFOCOM’17), FraudEagle (ICWSM’13), SpEagle (KDD’15), etc. However, real-world OSNs are asymmetric (i.e. directed) Leverage either labeled fraudulent users or normal users Labeled normal: TrustRank (VLDB’04), CatchSync (KDD’14), etc. Labeled fraudulent: DistrustRank (MTW’06), CIA (WWW’12), etc. However, both types of labels exist

9 Our Contribution: GANG
A novel global structure-based guilt-by-association method on directed graphs Capture unique characteristics of fraudulent-user-detection problem in directed OSNs Leverage both labeled fraudulent users and normal users Convergent and scalable

10 OUTLINE Background Algorithm Evaluation Conclusion

11 Problem Definition Input Output Direct social graph Training set
Labeled fraudulent nodes Labeled normal nodes Output Label of each remaining node

12 Notation Associate a binary r.v. xu with each node u
𝑥 𝑢 =1( 𝑥 𝑢 =−1) : u is fraudulent (normal) Pr( 𝑥 𝑢 =1): probability that u is fraudulent 𝛤 𝑏 𝑢 , 𝛤 𝑖 𝑢 , 𝛤 𝑜 𝑢 : bidirectional, unidirectional incoming, unidirectional outgoing neighbor of u 𝛤 𝑢 = 𝛤 𝑏 𝑢 U 𝛤 𝑖 𝑢 U 𝛤 𝑖 𝑢 : all neighbors of u 𝑥 𝑢 and 𝑥 𝛤 𝑢 : observed labels of u and u’s neighbors

13 Intuitions (1/3) Intuition I: Bidirectional neighbors v u
𝐽 𝑣𝑢 >0 : coupling strength F F v u N N

14 Intuitions (2/3) Intuition II: Unidirectional incoming neighbors
Intuition III: Unidirectional outgoing neighbors F ? v u N N N ? v u F F

15 Intuitions (3/3) Intuition IV: Model prior knowledge about u’s label
Finally, unify neighbor influences and prior knowledge ℎ 𝑢 >0(<0): u is fraudulent (normal) ℎ 𝑢 =0: u is unlabeled

16 Design of GANG (1/3) Capture intuitions via a pairwise Markov random field (pMRF) A pMRF models the joint probability distribution of all binary r.v.s xu for all nodes 𝑢∈𝑉 via an energy function H Our customized pMRF with the associated energy function

17 Design of GANG (2/3) Perform inference on pMRF via Loopy Belief Propagation (LBP) First, transform joint probability distribution of pMRF into a product of a set of node potentials and edge potentials Node potential Edge potential Set ℎ𝑜𝑚𝑜𝑝ℎ𝑖𝑙𝑦 𝑠𝑡𝑟𝑒𝑛𝑔ℎ 𝑤 𝑢𝑣 =𝑤>0.5 for all edges Bidirectional edge Unidirectional edge

18 Design of GANG (3/3) Then, compute posterior probability via LBP’s message-passing Sum-product to update messages Product to obtain the posterior probability

19 Detect fraudulent users
First, assign prior to all users (0<𝜃≤0.5) Then, obtain posterior probability of u via LBP on customized pMRF Finally, predict unlabeled u to be fraudulent if 𝑝 𝑢 =𝑃𝑟( 𝑥 𝑢 =1)>0.5 and normal, otherwise.

20 Shortcomings of GANG GANG is not scalable enough, because LBP maintains messages 𝑚 𝑣𝑢 on each edge (v, u) GANG is not guaranteed to converge, because LBP might oscillate on loopy graphs Address shortcomings Eliminate message maintenance Leverage linear approximation

21 Optimizing GANG (1/2) Eliminate message maintenance

22 Optimizing GANG Approximate GANG via residual and linearization
Residual variable: Linear approximation: Finally, represent optimized GANG as

23 Convergence Condition of Optimized GANG
Sufficient convergence condition

24 OUTLINE Background Algorithm Evaluation Conclusion

25 Experimental Setups Datasets Compared methods Training set
Large-scale Twitter Large-scale Sina Weibo Compared methods Using undirected graphs: SSL, SybilRank, SybilBelief, SybilSCAR Using directed graphs: TrustRank, DistrustRank, CIA, CatchSync Training set Twitter: randomly sampling 500K users Sina Weibo: randomly sample 1000 users

26 GANG consistently outperforms compared methods
Detection Accuracy GANG consistently outperforms compared methods

27 Top-Interval Ranking GANG achieves the best ranking performance
GANG significantly outperforms compared methods

28 GANG converges on both large-scale OSNs
Convergence GANG converges on both large-scale OSNs

29 Scalability Optimized GANG is slightly less efficient than random walk-based TrustRank, DistrustRank, and CIA Optimized GANG is one order of magnitude more scalable than the basic GANG

30 Case Study on Sina Weibo

31 OUTLINE Background Algorithm Evaluation Conclusion

32 Conclusion We propose a guilt-by-association method on directed graphs to detect fraudulent users in OSNs We design a customized pMRF to capture unique characteristics in directed graphs and leverage LBP to infer the pMRF We optimize GANG via message elimination & linear approximation Optimized GANG outperforms state-of-the-art methods, guarantees to converge, and is scalable enough


Download ppt "GANG: Detecting Fraudulent Users in OSNs"

Similar presentations


Ads by Google