Presentation is loading. Please wait.

Presentation is loading. Please wait.

© 2011 IBM Corporation IBM Research SIAM-DM 2011, Mesa AZ, USA, Non-Negative Residual Matrix Factorization w/ Application to Graph Anomaly Detection Hanghang.

Similar presentations


Presentation on theme: "© 2011 IBM Corporation IBM Research SIAM-DM 2011, Mesa AZ, USA, Non-Negative Residual Matrix Factorization w/ Application to Graph Anomaly Detection Hanghang."— Presentation transcript:

1 © 2011 IBM Corporation IBM Research SIAM-DM 2011, Mesa AZ, USA, Non-Negative Residual Matrix Factorization w/ Application to Graph Anomaly Detection Hanghang Tong and Ching-Yung Lin April 28-30, 2011

2 IBM Research © 2011 IBM Corporation Large Graphs are Everywhere! 2 ---------- Internet Map [Koren 2009] Food Web [2007] Protein Network [Salthe 2004] Social Network [Newman 2005] Web Graph Terrorist Network [Krebs 2002] Q: How to find patterns? e.g., community, anomaly, etc.

3 IBM Research © 2011 IBM Corporation  A Typical Procedure: Matrix Tool for Finding Graph Patterns Graph Adj. Matrix A A = F x G + R Low-rank matrices Residual matrix 3

4 IBM Research © 2011 IBM Corporation  A Typical Procedure: Matrix Tool for Finding Graph Patterns Graph Adj. Matrix A A = F x G + R community anomalies 4 An Illustrative Example Low-rank matrices Residual matrix

5 IBM Research © 2011 IBM Corporation  A Typical Procedure:  An Example Improve Interpretation by Non-negativity Interpretation by Non-negativity Graph Adjacency Matrix A A = F x G + R community anomalies Non-negative Matrix Factorization F >= 0; G >= 0 (for community detection) Non-negative Residual Matrix Factorization R(i,j) >= 0; for A(i,j) > 0 (for anomaly detection) This Paper 5

6 IBM Research © 2011 IBM Corporation Anomaly Detection on Graphs  Social Networks –`Popularity contest’  Computer Networks –Spammer, Port Scanner, Vulnerable Machines, etc  Financial Transaction Networks –Fraud transaction (e.g., money-laundry ring), scammer  Criminal Networks –New criminal trend  Tele-communication Networks –Tele-marketer 6 Key Observation: Abnormal Behavior  Actual Activities

7 IBM Research © 2011 IBM Corporation Challenges and Core Ideas  Challenges 1: Lack of `Ground-truth’  Core Idea 1: Using residual graph to improve the usability of anomaly detection results –(which is turned achieved by non-negative residual matrix factorization methods)  Challenges 2: Large Data  Core Idea 2: Carefully designed method, which scales linear wrt the size of the graph 7

8 IBM Research © 2011 IBM Corporation Optimization Formulation  General Case 8 Weighted Frobenius Form WeightCommon in Any Matrix Factorization

9 IBM Research © 2011 IBM Corporation Optimization Formulation  General Case 9 Non-negative residual Weighted Frobenius Form WeightCommon in Any Matrix Factorization Unique in This Paper

10 IBM Research © 2011 IBM Corporation Optimization Formulation  0/1 Weight Matrix (Major Focus of the Paper) 10 Non-negative residual Common in Any Matrix Factorization Unique in This Paper 0/1 weight

11 IBM Research © 2011 IBM Corporation Optimization Formulation with 0/1 Weight Matrix  NrMF with 0/1 Weight Matrix  Q: How to find ‘optimal’ F and G? –D1: Quality  C1: non-convexity of opt. objective –D2: Scalability  C2: large size of the graph 11

12 IBM Research © 2011 IBM Corporation Optimization Method: Batch Mode  Basic Idea 1: Alternating  Basic Idea 2: Separation 12 Not convex wrt F and G, jointly But convex if fixing either F or G argmin G s.t.. argmin G s.t.. For each j i, Standard Quadratic Programming Prob. Overall Complexity: Polynomial  Can we do better?

13 IBM Research © 2011 IBM Corporation Optimization Method: Incremental Mode  Basic Idea 1: Recursive  Basic Idea 2: Alternating  Basic Idea 3: Separation 13 Overall Complexity: Linear wrt # of edges QP for a single variable w/ boundary constrains Adjacency Matrix A Initialize: R=A Rank-1 Approximation Update Residual Matrix R Output Final Residual Matrix Do r times Can be solved in constant time

14 IBM Research © 2011 IBM Corporation Experimental Evaluation Effectiveness Anomaly Type AccuracyWall-clock Time # of edges 14 Efficiency

15 IBM Research © 2011 IBM Corporation Experimental Evaluation Effectiveness Efficiency Anomaly Type Accuracy Time # of edges # of type-2 nodes# of type-1 nodes 15

16 IBM Research © 2011 IBM Corporation Batch Method vs. Incremental Method Log Wall-clock time (sec.) Data SetIncremental Method Batch Method 16

17 IBM Research © 2011 IBM Corporation Conclusion  Problem Formulation: Non-negative Residual Matrix Factorization –a new matrix factorization for interpretable graph anomaly detection  Optimization Methods –Batch: straight-forward, polynomial time complexity –Incremental: linear time complexity  Future Work –Other interpretable properties (sparseness) for anomaly detection –Matrix Factorization w/ Total Non-negativity 17

18 IBM Research © 2011 IBM Corporation Thank you! htong@us.ibm.com (We are hiring at IBM Research!) 18

19 IBM Research © 2011 IBM Corporation Visual Comparison 19

20 IBM Research © 2011 IBM Corporation low q up q low up


Download ppt "© 2011 IBM Corporation IBM Research SIAM-DM 2011, Mesa AZ, USA, Non-Negative Residual Matrix Factorization w/ Application to Graph Anomaly Detection Hanghang."

Similar presentations


Ads by Google