Download presentation

Presentation is loading. Please wait.

Published byJeremy Bollom Modified over 4 years ago

1
School of Computer Science Carnegie Mellon University National Taiwan University of Science & Technology Unifying Guilt-by-Association Approaches: Theorems and Fast Algorithms Danai Koutra U Kang Hsing-Kuo Kenneth Pao Tai-You Ke Duen Horng (Polo) Chau Christos Faloutsos ECML PKDD, 5-9 September 2011, Athens, Greece

2
Problem Definition: G B A techniques Given: graph with N nodes & M edges; few labeled nodes Find: class (red/green) for rest nodes Assuming: network effects ( homophily/ heterophily ) © Danai Koutra - PKDD'11

3
Homophily and Heterophily Step 1 Step 2 All methods handle homophily NOT all methods handle heterophily BUT proposed method does! NOT all methods handle heterophily BUT proposed method does! © Danai Koutra - PKDD'11

4
Why do we study these methods? © Danai Koutra - PKDD'11

5
Motivation (1): Law Enforcement [Tong+ ’06][Lin+ ‘04][Chen+ ’11]… © Danai Koutra - PKDD'11

6
Motivation (2): Cyber Security victims? [ Kephart+ ’95 ] [Kolter+ ’06 ][Song+ ’08-’11][Chau+ ‘11]… botnet members? bot © Danai Koutra - PKDD'11

7
Motivation (3): Fraud Detection Lax controls? [Neville+ ‘05][Chau+ ’07][McGlohon+ ’09]… fraudsters? fraudster © Danai Koutra - PKDD'11

8
Motivation (4): Ranking [Brin+ ‘98][Tong+ ’06][Ji+ ‘11]… © Danai Koutra - PKDD'11

9
Our Contributions Theory correspondence: BP ≈ RWR ≈ SSL linearization for BP convergence criteria for linearized BP Practice F A BP algorithm fast accurate and scalable Experiments on DBLP, Web, and Kronecker graphs © Danai Koutra - PKDD'11

10
Roadmap Background Belief Propagation Random Walk with Restarts Semi-supervised Learning Linearized BP Correspondence of Methods Proposed Algorithm Experiments Conclusions © Danai Koutra - PKDD'11

11
Background Apologies for diversion… © Danai Koutra - PKDD'11

12
Background 1: Belief Propagation (BP) Iterative message-based method 0.90.1 0.20.8 0.30.7 0.90.1 1 st round 2 nd round... until stop criterion fulfilled “Propagation matrix”: Homophily Heterophily 0.90.1 0.9 class of “sender” class of “receiver” Usually same diagonal = homophily factor h Usually same diagonal = homophily factor h “about-half” homophily factor h h = h-0.5 “about-half” homophily factor h h = h-0.5 0.4-0.4 0.4 © Danai Koutra - PKDD'11

13
Background 1: Belief Propagation Equations [Pearl ‘82][Yedidia+ ‘02] …[Pandit+ ‘07][Gonzalez+ ‘09][Chechetka+ ‘10] © Danai Koutra - PKDD'11

14
Background 2: Semi-Supervised Learning graph-based SSL use few labeled data & exploit neighborhood information STEP1STEP1 STEP1STEP1 STEP2STEP2 STEP2STEP2 0.8 -0.3 ? ? -0.1 0.6 0.8 [Zhou ‘06][Ji, Han ’10]… © Danai Koutra - PKDD'11

15
Background 3: Personalized Random Walk with Restarts (RWR) [Brin+ ’98][Haveliwala ’03][Tong+ ‘06][Minkov, Cohen ‘07]… © Danai Koutra - PKDD'11

16
Background © Danai Koutra - PKDD'11

17
Qualitative Comparison of G B A Methods GBA Method HeterophilyScalabilityConvergence RWR ✗✓✓ SSL ✗✓✓ BP ✓✓ ? F A BP ✓✓✓ © Danai Koutra - PKDD'11

18
Qualitative Comparison of G B A Methods GBA Method HeterophilyScalabilityConvergence RWR ✗✓✓ SSL ✗✓✓ BP ✓✓ ? F A BP ✓✓✓ © Danai Koutra - PKDD'11

19
Roadmap Background Linearized BP Correspondence of Methods Proposed Algorithm Experiments Conclusions New work Previous work © Danai Koutra - PKDD'11

20
Linearized BP Odds ratio Maclaurin expansions Odds ratio Maclaurin expansions BP is approximated by Theorem [Koutra+] Sketch of proof 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 ? ? 0 -10 -2 10 -2 0 -10 -2 10 -2 1 d1 d2 d3 d1 d2 d3 final beliefs prior beliefs scalar constants 0.5 pipi 0 “ ” 1 DETAILS! © Danai Koutra - PKDD'11

21
Linearized BP vs BP BP is approximated by Linearized BP 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 ? ? 0 -10 -2 10 -2 0 -10 -2 10 -2 1 d1 d2 d3 d1 d2 d3 linearnon-linear Belief Propagation Our proposal:Original [Yedidia+]: © Danai Koutra - PKDD'11

22
Our Contributions Theory correspondence: BP ≈ RWR ≈ SSL linearization for BP convergence criteria for linearized BP Practice F A BP algorithm fast accurate and scalable Experiments on DBLP, Web, and Kronecker graphs ✓ © Danai Koutra - PKDD'11

23
DETAILS! Linearized BP converges if Linearized BP: convergence Theorem degree of node n 1-norm < 1 OR Frobenius norm < 1 1-norm < 1 OR Frobenius norm < 1 Sketch of proof © Danai Koutra - PKDD'11

24
Our Contributions Theory correspondence: BP ≈ RWR ≈ SSL linearization for BP convergence criteria for linearized BP Practice F A BP algorithm fast accurate and scalable Experiments on DBLP, Web, and Kronecker graphs ✓ ✓ © Danai Koutra - PKDD'11

25
Roadmap Background Linearized BP Correspondence of Methods Proposed Algorithm Experiments Conclusions © Danai Koutra - PKDD'11

26
Correspondence of Methods MethodMatrixUnknownknown RWR [I – c AD -1 ]×x=(1-c)y SSL [I + a (D - A)] ×x=y F A BP [I + a D - c ’ A] ×bhbh =φhφh 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 ? ? 0 1 0 1 d1 d2 d3 d1 d2 d3 final labels/ beliefs prior labels/ beliefs adjacency matrix © Danai Koutra - PKDD'11

27
RWR ≈ SSL RWR and SSL identical if THEOREM individual homophily strength of node i (SSL) fly-out probability (RWR) Simplification global homophily strength of nodes (SSL) DETAILS! © Danai Koutra - PKDD'11

28
RWR ≈ SSL: example similar scores and identical rankings y = x RWR scores SSL scores individual hom. strength global hom. strength © Danai Koutra - PKDD'11

29
Our Contributions Theory correspondence: BP ≈ RWR ≈ SSL linearization for BP convergence criteria for linearized BP Practice F A BP algorithm fast accurate and scalable Experiments on DBLP, Web, and Kronecker graphs ✓ ✓ ✓ © Danai Koutra - PKDD'11

30
Roadmap Background Linearized BP Correspondence of Methods Proposed Algorithm Experiments Conclusions © Danai Koutra - PKDD'11

31
Proposed algorithm: F A BP ①Pick the homophily factor ②Solve the linear system ①(opt) If accuracy is low, run BP with prior beliefs. 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 ? ? 0 1 0 1 d1 d2 d3 d1 d2 d3 0.5 pipi 0 “ ” 1 © Danai Koutra - PKDD'11

32
Roadmap Background Linearized BP Correspondence of Methods Proposed Algorithm Experiments Conclusions © Danai Koutra - PKDD'11

33
Datasets p% labeled nodes initially YahooWeb:.edu/others | DBLP: AI/not AI accuracy computed on hold-out set Dataset# nodes# edges YahooWeb 1,413,511,3906,636,600,779 Kronecker 1 177,1471,977,149,596 Kronecker 2 120,5521,145,744,786 Kronecker 3 59,049282,416,924 Kronecker 4 19,68340,333,924 DBLP 37,791170,794 6 billion! © Danai Koutra - PKDD'11

34
Specs hadoop version 0.20.2 M45 hadoop cluster (Yahoo!) 500 machines 4000 cores 1.5PB total storage 3.5TB of memory 100 machines used for the experiments © Danai Koutra - PKDD'11

35
Roadmap Background Linearized BP Correspondence of Methods Proposed Algorithm Experiments 1. Accuracy 2. Convergence 3. Sensitivity 4. Scalability 5. Parallelism Conclusions © Danai Koutra - PKDD'11

36
Results (1): Accuracy All points on the diagonal scores near-identical beliefs in BP beliefs in F A BP 0.3% labels Scatter plot of beliefs for (h, priors) = ( 0.5±0.002, 0.5±0.001 ) AI non-AI © Danai Koutra - PKDD'11

37
Results (2): Convergence F A BP achieves maximum accuracy within the convergence bounds. Accuracy wrt h h (priors = ±0.001) 0.3% labels h % accuracy frobenius norm |e_val| = 1 1-norm convergence bounds h © Danai Koutra - PKDD'11

38
Accuracy wrt h h (priors = ±0.001) 0.3% labels h % accuracy frobenius norm |e_val| = 1 1-norm F A BP is robust to the homophily factor h h within the convergence bounds. Results (3): Sensitivity to the homophily factor convergence bounds © Danai Koutra - PKDD'11

39
( For all plots ) Average over 10 runs Error bars tiny h % accuracy h prior beliefs’ magnitude note © Danai Koutra - PKDD'11

40
Results (3): Sensitivity to the prior beliefs F A BP is robust to the prior beliefs φ h. % accuracy prior beliefs’ magnitude Accuracy wrt priors (h h = ±0.002) p=5% p=0.1% p=0.3% p=0.5% © Danai Koutra - PKDD'11

41
Results (4): Scalability F A BP is linear on the number of edges. # of edges (Kronecker graphs) runtime (min) © Danai Koutra - PKDD'11

42
Results (5): Parallelism F A BP ~2x faster & wins/ties on accuracy. # of steps runtime (min) % accuracy runtime (min) © Danai Koutra - PKDD'11

43
Roadmap Background Linearized BP Correspondence of Methods Proposed Algorithm Experiments Conclusions © Danai Koutra - PKDD'11

44
Our Contributions Theory correspondence: BP ≈ RWR ≈ SSL linearization for BP convergence criteria for linearized BP Practice F A BP algorithm fast accurate and scalable Experiments on DBLP, Web, and Kronecker graphs ~2x faster 6 billion edges! same/better ✓ ✓ ✓ ✓ ✓ © Danai Koutra - PKDD'11

45
Thanks Data Funding NSC ILLINOIS Ming Ji, Jiawei Han © Danai Koutra - PKDD'11

46
Thank you! % accuracy runtime (min) © Danai Koutra - PKDD'11

47
Q: Can we have multiple classes? AI ML DB 0.70.20.1 0.20.60.2 0.10.20.7 Propagation matrix A: yes! © Danai Koutra - PKDD'11

48
Q: Which of the methods do you recommend? A: (Fast) Belief Propagation Reasons: solid bayesian foundation heterophily and multiple classes 0.70.20.1 0.20.60.2 0.10.20.7 Propagation matrix © Danai Koutra - PKDD'11

49
Q: Why is F A BP faster than BP? A: BP 2|E| messages per iteration F A BP |V| records per “power method” iteration |V| < 2 |E| © Danai Koutra - PKDD'11

Similar presentations

© 2019 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google