Presentation is loading. Please wait.

Presentation is loading. Please wait.

CMU SCS Graph Mining Christos Faloutsos CMU. CMU SCS iCAST, Jan. 09C. Faloutsos 2 Thank you! Prof. Hsing-Kuo Kenneth Pao Eric, Morgan, Ian, Teenet.

Similar presentations


Presentation on theme: "CMU SCS Graph Mining Christos Faloutsos CMU. CMU SCS iCAST, Jan. 09C. Faloutsos 2 Thank you! Prof. Hsing-Kuo Kenneth Pao Eric, Morgan, Ian, Teenet."— Presentation transcript:

1 CMU SCS Graph Mining Christos Faloutsos CMU

2 CMU SCS iCAST, Jan. 09C. Faloutsos 2 Thank you! Prof. Hsing-Kuo Kenneth Pao Eric, Morgan, Ian, Teenet

3 CMU SCS iCAST, Jan. 09C. Faloutsos 3 Outline Problem definition / Motivation Static & dynamic laws; generators Tools: CenterPiece graphs; Tensors Other projects (Virus propagation, e-bay fraud detection) Conclusions

4 CMU SCS iCAST, Jan. 09C. Faloutsos 4 Motivation Data mining: ~ find patterns (rules, outliers) Problem#1: How do real graphs look like? Problem#2: How do they evolve? Problem#3: How to generate realistic graphs TOOLS Problem#4: Who is the ‘master-mind’? Problem#5: Track communities over time

5 CMU SCS iCAST, Jan. 09C. Faloutsos 5 Problem#1: Joint work with Dr. Deepayan Chakrabarti (CMU/Yahoo R.L.)

6 CMU SCS iCAST, Jan. 09C. Faloutsos 6 Graphs – why should we care Intrusion detection – who-contacts-whom normal traffic abnormal traffic destination source destination source

7 CMU SCS iCAST, Jan. 09C. Faloutsos 7 Graphs - why should we care? Internet Map [lumeta.com] Food Web [Martinez ’91] Protein Interactions [genomebiology.com] Friendship Network [Moody ’01]

8 CMU SCS iCAST, Jan. 09C. Faloutsos 8 Graphs - why should we care? IR: bi-partite graphs (doc-terms) web: hyper-text graph... and more: D1D1 DNDN T1T1 TMTM...

9 CMU SCS iCAST, Jan. 09C. Faloutsos 9 Graphs - why should we care? network of companies & board-of-directors members ‘viral’ marketing web-log (‘blog’) news propagation computer network security: email/IP traffic and anomaly detection....

10 CMU SCS iCAST, Jan. 09C. Faloutsos 10 Problem #1 - network and graph mining How does the Internet look like? How does the web look like? What is ‘normal’/‘abnormal’? which patterns/laws hold?

11 CMU SCS iCAST, Jan. 09C. Faloutsos 11 Graph mining Are real graphs random?

12 CMU SCS iCAST, Jan. 09C. Faloutsos 12 Laws and patterns Are real graphs random? A: NO!! –Diameter –in- and out- degree distributions –other (surprising) patterns

13 CMU SCS iCAST, Jan. 09C. Faloutsos 13 Solution#1 Power law in the degree distribution [SIGCOMM99] log(rank) log(degree) -0.82 internet domains att.com ibm.com

14 CMU SCS iCAST, Jan. 09C. Faloutsos 14 Solution#1’: Eigen Exponent E A2: power law in the eigenvalues of the adjacency matrix E = -0.48 Exponent = slope Eigenvalue Rank of decreasing eigenvalue May 2001

15 CMU SCS iCAST, Jan. 09C. Faloutsos 15 Solution#1’: Eigen Exponent E [Mihail, Papadimitriou ’02]: slope is ½ of rank exponent E = -0.48 Exponent = slope Eigenvalue Rank of decreasing eigenvalue May 2001

16 CMU SCS iCAST, Jan. 09C. Faloutsos 16 But: How about graphs from other domains?

17 CMU SCS iCAST, Jan. 09C. Faloutsos 17 The Peer-to-Peer Topology Count versus degree Number of adjacent peers follows a power-law [Jovanovic+]

18 CMU SCS iCAST, Jan. 09C. Faloutsos 18 More power laws: citation counts: (citeseer.nj.nec.com 6/2001) log(#citations) log(count) Ullman

19 CMU SCS iCAST, Jan. 09C. Faloutsos 19 More power laws: web hit counts [w/ A. Montgomery] Web Site Traffic log(in-degree) log(count) Zipf users sites ``ebay’’

20 CMU SCS iCAST, Jan. 09C. Faloutsos 20 epinions.com who-trusts-whom [Richardson + Domingos, KDD 2001] (out) degree count trusts-2000-people user

21 CMU SCS iCAST, Jan. 09C. Faloutsos 21 Motivation Data mining: ~ find patterns (rules, outliers) Problem#1: How do real graphs look like? Problem#2: How do they evolve? Problem#3: How to generate realistic graphs TOOLS Problem#4: Who is the ‘master-mind’? Problem#5: Track communities over time

22 CMU SCS iCAST, Jan. 09C. Faloutsos 22 Problem#2: Time evolution with Jure Leskovec (CMU/MLD) and Jon Kleinberg (Cornell – sabb. @ CMU)

23 CMU SCS iCAST, Jan. 09C. Faloutsos 23 Evolution of the Diameter Prior work on Power Law graphs hints at slowly growing diameter: –diameter ~ O(log N) –diameter ~ O(log log N) What is happening in real data?

24 CMU SCS iCAST, Jan. 09C. Faloutsos 24 Evolution of the Diameter Prior work on Power Law graphs hints at slowly growing diameter: –diameter ~ O(log N) –diameter ~ O(log log N) What is happening in real data? Diameter shrinks over time

25 CMU SCS iCAST, Jan. 09C. Faloutsos 25 Diameter – ArXiv citation graph Citations among physics papers 1992 –2003 One graph per year time [years] diameter

26 CMU SCS iCAST, Jan. 09C. Faloutsos 26 Diameter – “Autonomous Systems” Graph of Internet One graph per day 1997 – 2000 number of nodes diameter

27 CMU SCS iCAST, Jan. 09C. Faloutsos 27 Diameter – “Affiliation Network” Graph of collaborations in physics – authors linked to papers 10 years of data time [years] diameter

28 CMU SCS iCAST, Jan. 09C. Faloutsos 28 Diameter – “Patents” Patent citation network 25 years of data time [years] diameter

29 CMU SCS iCAST, Jan. 09C. Faloutsos 29 Temporal Evolution of the Graphs N(t) … nodes at time t E(t) … edges at time t Suppose that N(t+1) = 2 * N(t) Q: what is your guess for E(t+1) =? 2 * E(t)

30 CMU SCS iCAST, Jan. 09C. Faloutsos 30 Temporal Evolution of the Graphs N(t) … nodes at time t E(t) … edges at time t Suppose that N(t+1) = 2 * N(t) Q: what is your guess for E(t+1) =? 2 * E(t) A: over-doubled! –But obeying the ``Densification Power Law’’

31 CMU SCS iCAST, Jan. 09C. Faloutsos 31 Densification – Physics Citations Citations among physics papers 2003: –29,555 papers, 352,807 citations N(t) E(t) ??

32 CMU SCS iCAST, Jan. 09C. Faloutsos 32 Densification – Physics Citations Citations among physics papers 2003: –29,555 papers, 352,807 citations N(t) E(t) 1.69

33 CMU SCS iCAST, Jan. 09C. Faloutsos 33 Densification – Physics Citations Citations among physics papers 2003: –29,555 papers, 352,807 citations N(t) E(t) 1.69 1: tree

34 CMU SCS iCAST, Jan. 09C. Faloutsos 34 Densification – Physics Citations Citations among physics papers 2003: –29,555 papers, 352,807 citations N(t) E(t) 1.69 clique: 2

35 CMU SCS iCAST, Jan. 09C. Faloutsos 35 Densification – Patent Citations Citations among patents granted 1999 –2.9 million nodes –16.5 million edges Each year is a datapoint N(t) E(t) 1.66

36 CMU SCS iCAST, Jan. 09C. Faloutsos 36 Densification – Autonomous Systems Graph of Internet 2000 –6,000 nodes –26,000 edges One graph per day N(t) E(t) 1.18

37 CMU SCS iCAST, Jan. 09C. Faloutsos 37 Densification – Affiliation Network Authors linked to their publications 2002 –60,000 nodes 20,000 authors 38,000 papers –133,000 edges N(t) E(t) 1.15

38 CMU SCS iCAST, Jan. 09C. Faloutsos 38 Motivation Data mining: ~ find patterns (rules, outliers) Problem#1: How do real graphs look like? Problem#2: How do they evolve? Problem#3: How to generate realistic graphs TOOLS Problem#4: Who is the ‘master-mind’? Problem#5: Track communities over time

39 CMU SCS iCAST, Jan. 09C. Faloutsos 39 Motivation Data mining: ~ find patterns (rules, outliers) Problem#1: How do real graphs look like? Problem#2: How do they evolve? Problem#3: How to generate realistic graphs TOOLS Problem#4: Who is the ‘master-mind’? Problem#5: Track communities over time

40 CMU SCS iCAST, Jan. 09C. Faloutsos 40 Problem#4: MasterMind – ‘CePS’ w/ Hanghang Tong, KDD 2006 htong cs.cmu.edu

41 CMU SCS iCAST, Jan. 09C. Faloutsos 41 Center-Piece Subgraph(Ceps) Given Q query nodes Find Center-piece ( ) App. –Social Networks –Law Inforcement, … Idea: –Proximity -> random walk with restarts

42 CMU SCS iCAST, Jan. 09C. Faloutsos 42 Case Study: AND query R.AgrawalJiawei Han V.VapnikM.Jordan

43 CMU SCS iCAST, Jan. 09C. Faloutsos 43 Case Study: AND query

44 CMU SCS iCAST, Jan. 09C. Faloutsos 44 Case Study: AND query

45 CMU SCS iCAST, Jan. 09C. Faloutsos 45 Conclusions Q1:How to measure the importance? A1: RWR+K_SoftAnd Q2:How to do it efficiently? A2:Graph Partition (Fast CePS) –~90% quality –150x speedup (ICDM’06, b.p. award)

46 CMU SCS iCAST, Jan. 09C. Faloutsos 46 Outline Problem definition / Motivation Static & dynamic laws; generators Tools: CenterPiece graphs; Tensors Other projects (Virus propagation, e-bay fraud detection) Conclusions

47 CMU SCS iCAST, Jan. 09C. Faloutsos 47 Motivation Data mining: ~ find patterns (rules, outliers) Problem#1: How do real graphs look like? Problem#2: How do they evolve? Problem#3: How to generate realistic graphs TOOLS Problem#4: Who is the ‘master-mind’? Problem#5: Track communities over time

48 CMU SCS iCAST, Jan. 09C. Faloutsos 48 Tensors for time evolving graphs [Jimeng Sun+ KDD’06] [ “, SDM’07] [ CF, Kolda, Sun, SDM’07 tutorial]

49 CMU SCS iCAST, Jan. 09C. Faloutsos 49 Social network analysis Static: find community structures DB A u t h o r s Keywords 1990

50 CMU SCS iCAST, Jan. 09C. Faloutsos 50 Social network analysis Static: find community structures DB A u t h o r s 1990 1991 1992

51 CMU SCS iCAST, Jan. 09C. Faloutsos 51 Social network analysis Static: find community structures Dynamic: monitor community structure evolution; spot abnormal individuals; abnormal time-stamps

52 CMU SCS iCAST, Jan. 09C. Faloutsos 52 DB DM Application 1: Multiway latent semantic indexing (LSI) DB 2004 1990 Michael Stonebraker Query Pattern U keyword authors keyword U authors Projection matrices specify the clusters Core tensors give cluster activation level Philip Yu

53 CMU SCS iCAST, Jan. 09C. Faloutsos 53 Bibliographic data (DBLP) Papers from VLDB and KDD conferences Construct 2nd order tensors with yearly windows with Each tensor: 4584  3741 11 timestamps (years)

54 CMU SCS iCAST, Jan. 09C. Faloutsos 54 Multiway LSI AuthorsKeywordsYear michael carey, michael stonebraker, h. jagadish, hector garcia-molina queri,parallel,optimization,concurr, objectorient 1995 surajit chaudhuri,mitch cherniack,michael stonebraker,ugur etintemel distribut,systems,view,storage,servic,pr ocess,cache 2004 jiawei han,jian pei,philip s. yu, jianyong wang,charu c. aggarwal streams,pattern,support, cluster, index,gener,queri 2004 Two groups are correctly identified: Databases and Data mining People and concepts are drifting over time DM DB

55 CMU SCS iCAST, Jan. 09C. Faloutsos 55 Network forensics Directional network flows A large ISP with 100 POPs, each POP 10Gbps link capacity [Hotnets2004] –450 GB/hour with compression Task: Identify abnormal traffic pattern and find out the cause normal traffic abnormal traffic destination source destination source (with Prof. Hui Zhang and Dr. Yinglian Xie)

56 CMU SCS iCAST, Jan. 09C. Faloutsos 56 MDL mining on time-evolving graph (Enron emails) GraphScope [w. Jimeng Sun, Spiros Papadimitriou and Philip Yu, KDD’07]

57 CMU SCS iCAST, Jan. 09C. Faloutsos 57 Conclusions Tensor-based methods (WTA/DTA/STA): spot patterns and anomalies on time evolving graphs, and on streams (monitoring)

58 CMU SCS iCAST, Jan. 09C. Faloutsos 58 Motivation Data mining: ~ find patterns (rules, outliers) Problem#1: How do real graphs look like? Problem#2: How do they evolve? Problem#3: How to generate realistic graphs TOOLS Problem#4: Who is the ‘master-mind’? Problem#5: Track communities over time

59 CMU SCS iCAST, Jan. 09C. Faloutsos 59 Outline Problem definition / Motivation Static & dynamic laws; generators Tools: CenterPiece graphs; Tensors Other projects (e-bay fraud detection, blogs, weighted graphs) Conclusions

60 CMU SCS iCAST, Jan. 09C. Faloutsos 60 E-bay Fraud detection w/ Polo Chau & Shashank Pandit, CMU

61 CMU SCS iCAST, Jan. 09C. Faloutsos 61 E-bay Fraud detection lines: positive feedbacks would you buy from him/her?

62 CMU SCS iCAST, Jan. 09C. Faloutsos 62 E-bay Fraud detection lines: positive feedbacks would you buy from him/her? or him/her?

63 CMU SCS iCAST, Jan. 09C. Faloutsos 63 E-bay Fraud detection - NetProbe

64 CMU SCS iCAST, Jan. 09C. Faloutsos 64 Outline Problem definition / Motivation Static & dynamic laws; generators Tools: CenterPiece graphs; Tensors Other projects (e-bay fraud detection, blogs, weighted graphs) Conclusions

65 CMU SCS iCAST, Jan. 09C. Faloutsos 65 Blog analysis with Mary McGlohon (CMU) Jure Leskovec (CMU) Natalie Glance (now at Google) Mat Hurst (now at MSR) [SDM’07]

66 CMU SCS iCAST, Jan. 09C. Faloutsos 66 Cascades on the Blogosphere B1B1 B2B2 B4B4 B3B3 a b c d e 1 B1B1 B2B2 B4B4 B3B3 1 1 2 3 1 Blogosphere blogs + posts Blog network links among blogs Post network links among posts Q1: popularity-decay of a post? Q2: degree distributions?

67 CMU SCS iCAST, Jan. 09C. Faloutsos 67 Q1: popularity over time Days after post Post popularity drops-off – exponentially? days after post # in links 1 2 3

68 CMU SCS iCAST, Jan. 09C. Faloutsos 68 Q1: popularity over time Days after post Post popularity drops-off – exponentially? POWER LAW! Exponent? # in links (log) 1 2 3 days after post (log)

69 CMU SCS iCAST, Jan. 09C. Faloutsos 69 Q1: popularity over time Days after post Post popularity drops-off – exponentially? POWER LAW! Exponent? -1.6 (close to -1.5: Barabasi’s stack model) # in links (log) 1 2 3 -1.6 days after post (log)

70 CMU SCS iCAST, Jan. 09C. Faloutsos 70 Q2: degree distribution 44,356 nodes, 122,153 edges. Half of blogs belong to largest connected component. blog in-degree count B1B1 B2B2 B4B4 B3B3 1 1 2 3 1 ??

71 CMU SCS iCAST, Jan. 09C. Faloutsos 71 Q2: degree distribution 44,356 nodes, 122,153 edges. Half of blogs belong to largest connected component. blog in-degree count B1B1 B2B2 B4B4 B3B3 1 1 2 3 1

72 CMU SCS iCAST, Jan. 09C. Faloutsos 72 Q2: degree distribution 44,356 nodes, 122,153 edges. Half of blogs belong to largest connected component. blog in-degree count in-degree slope: -1.7 out-degree: -3 ‘rich get richer’

73 CMU SCS iCAST, Jan. 09C. Faloutsos 73 Outline Problem definition / Motivation Static & dynamic laws; generators Tools: CenterPiece graphs; Tensors Other projects (e-bay fraud detection, blogs, weighted graphs) [work in progress: iCAST data analysis] Conclusions

74 CMU SCS iCAST, Jan. 09C. Faloutsos 74 Joint work with Leman Akoglu www.andrew.cmu.edu/~lakoglu Mary McGlohon www.cs.cmu.edu/~mmcgloho Thanks to Eric, Morgan, Ian, Teenet, for providing a copy of the dataset

75 CMU SCS iCAST, Jan. 09C. Faloutsos 75 Summary of findings Who-contacts-whom graph follows old (and new, surprising) patterns Web servers stand out, though we are packaging all these tools, for open- source release: – ADAGE www.cs.cmu.edu/~mmcgloho/pubs/ADAGE.tar.gz – OddBall (under development)

76 CMU SCS No surprise: shrinking diameter, as `expected’ Shrinking diameter

77 CMU SCS Size of GCC, NLCC GCC size grows over time, of course. What is your guess about the size of rest, say 2 nd CC? shrinks? grows? stays the same?

78 CMU SCS A: OSCILLATES! No ‘surprise’ either: Typical behavior of graphs [McGlohon+, KDD’08] Size of GCC, NLCC

79 CMU SCS iCAST, Jan. 09C. Faloutsos 79 # Packets over time: bursty behavior, with daily periodicity Packets over time

80 CMU SCS 1. Densification law: obeyed (with slope ~1: tree like (!)) 2.‘hours’ plot has a sudden gap @ 20-80 nodes Densification law

81 CMU SCS iCAST, Jan. 09C. Faloutsos 81 ‘Weight’ power law Weight: super-linear on number of edges – also ‘expected’: the more contacts you have, the even more packets you send! Slope 1.3, 1.15; plateau @ 100 edges 1.3

82 CMU SCS iCAST, Jan. 09C. Faloutsos 82 192.168.21.67 : Multiple times listed as: "SMTP expn root" Src: “192.168.21.67" Dest: "0.0.0.206" "Mail Server“ "[Case2]An attempt to gather accounts of mail server. Besides some alarms regarding to gathering information were happened at the same time. It is probably a pre-step of attacks from Case 1 Or "WEB-MISC xp_cmdshell attempt" Src: "192.168.21.67" Dest: "0.0.0.251""Web Server" "[Refer to Case 1 in report] A compromised end user located at local network with Trojan Horse are used by hacker to invoke SQL injection and Command Shell executing attacks via web server for stealing sensitive data. (ACER issued Case ID: HD00000003?????)" Mostly servers are detected. 192.168.21.67 listed in top 10 outliers but is only slightly over the line, and others are not listed in the attack cases. Anomaly detection 15 1

83 CMU SCS iCAST, Jan. 09C. Faloutsos 83 Mostly servers are discovered. Anomaly detection II 15 1

84 CMU SCS iCAST, Jan. 09C. Faloutsos 84

85 CMU SCS iCAST, Jan. 09C. Faloutsos 85 OVERALL CONCLUSIONS Graphs pose a wealth of fascinating problems self-similarity and power laws work, when textbook methods fail! New patterns (shrinking diameter!) SVD / tensors / RWR: valuable tools Intrusion detection: closely related – ADAGE, OddBall

86 CMU SCS iCAST, Jan. 09C. Faloutsos 86 References L. Akoglu, M. McGlohon, C. Faloutsos. RTM : Laws and a Recursive Generator for Weighted Time-Evolving Graphs. IEEE ICDM, Pisa, Italy, Dec. 2008

87 CMU SCS iCAST, Jan. 09C. Faloutsos 87 References Jure Leskovec, Jon Kleinberg and Christos Faloutsos Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations KDD 2005, Chicago, IL. ("Best Research Paper" award).Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, Christos Faloutsos Realistic, Mathematically Tractable Graph Generation and Evolution, Using Kronecker Multiplication (ECML/PKDD 2005), Porto, Portugal, 2005.Realistic, Mathematically Tractable Graph Generation and Evolution, Using Kronecker MultiplicationECML/PKDD 2005

88 CMU SCS iCAST, Jan. 09C. Faloutsos 88 References Jure Leskovec and Christos Faloutsos, Scalable Modeling of Real Graphs using Kronecker Multiplication, ICML 2007, Corvallis, OR, USA

89 CMU SCS iCAST, Jan. 09C. Faloutsos 89 References M. McGlohon, L. Akoglu, C. Faloutsos. Weighted Graphs and Disconnected Components: Patterns and a Generator. ACM SIGKDD, Las Vegas, NV, USA, Aug. 2008.

90 CMU SCS iCAST, Jan. 09C. Faloutsos 90 References Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang and Christos Faloutsos NetProbe: A Fast and Scalable System for Fraud Detection in Online Auction Networks WWW 2007, Banff, Alberta, Canada, May 8-12, 2007. NetProbe: A Fast and Scalable System for Fraud Detection in Online Auction Networks Jimeng Sun, Dacheng Tao, Christos Faloutsos Beyond Streams and Graphs: Dynamic Tensor Analysis, KDD 2006, Philadelphia, PABeyond Streams and Graphs: Dynamic Tensor Analysis,

91 CMU SCS iCAST, Jan. 09C. Faloutsos 91 References Jimeng Sun, Yinglian Xie, Hui Zhang, Christos Faloutsos. Less is More: Compact Matrix Decomposition for Large Sparse Graphs, SDM, Minneapolis, Minnesota, Apr 2007. [pdf]pdf Jimeng Sun, Spiros Papadimitriou, Philip S. Yu, and Christos Faloutsos, GraphScope: Parameter- free Mining of Large Time-evolving Graphs ACM SIGKDD Conference, San Jose, CA, August 2007

92 CMU SCS iCAST, Jan. 09C. Faloutsos 92 References Hanghang Tong, Christos Faloutsos, and Jia-Yu Pan Fast Random Walk with Restart and Its Applications ICDM 2006, Hong Kong.Fast Random Walk with Restart and Its Applications Hanghang Tong, Christos Faloutsos Center-Piece Subgraphs: Problem Definition and Fast Solutions, KDD 2006, Philadelphia, PACenter-Piece Subgraphs: Problem Definition and Fast Solutions, Hanghang Tong, Brian Gallagher, Christos Faloutsos, and Tina Eliassi-Rad Fast Best-Effort Pattern Matching in Large Attributed Graphs KDD 2007, San Jose, CAFast Best-Effort Pattern Matching in Large Attributed Graphs

93 CMU SCS iCAST, Jan. 09C. Faloutsos 93 Contact info: www. cs.cmu.edu /~christos (w/ papers, datasets, code, etc)

94 CMU SCS iCAST, Jan. 09C. Faloutsos 94 Extra: Graph Generators

95 CMU SCS iCAST, Jan. 09C. Faloutsos 95 Problem#3: Generation Given a growing graph with count of nodes N 1, N 2, … Generate a realistic sequence of graphs that will obey all the patterns

96 CMU SCS iCAST, Jan. 09C. Faloutsos 96 Problem Definition Given a growing graph with count of nodes N 1, N 2, … Generate a realistic sequence of graphs that will obey all the patterns –Static Patterns Power Law Degree Distribution Power Law eigenvalue and eigenvector distribution Small Diameter –Dynamic Patterns Growth Power Law Shrinking/Stabilizing Diameters

97 CMU SCS iCAST, Jan. 09C. Faloutsos 97 Problem Definition Given a growing graph with count of nodes N 1, N 2, … Generate a realistic sequence of graphs that will obey all the patterns Idea: Self-similarity –Leads to power laws –Communities within communities –…

98 CMU SCS iCAST, Jan. 09C. Faloutsos 98 Adjacency matrix Kronecker Product – a Graph Intermediate stage Adjacency matrix

99 CMU SCS iCAST, Jan. 09C. Faloutsos 99 Kronecker Product – a Graph Continuing multiplying with G 1 we obtain G 4 and so on … G 4 adjacency matrix

100 CMU SCS iCAST, Jan. 09C. Faloutsos 100 Kronecker Product – a Graph Continuing multiplying with G 1 we obtain G 4 and so on … G 4 adjacency matrix

101 CMU SCS iCAST, Jan. 09C. Faloutsos 101 Kronecker Product – a Graph Continuing multiplying with G 1 we obtain G 4 and so on … G 4 adjacency matrix

102 CMU SCS iCAST, Jan. 09C. Faloutsos 102 Properties: We can PROVE that –Degree distribution is multinomial ~ power law –Diameter: constant –Eigenvalue distribution: multinomial –First eigenvector: multinomial See [Leskovec+, PKDD’05] for proofs

103 CMU SCS iCAST, Jan. 09C. Faloutsos 103 Problem Definition Given a growing graph with nodes N 1, N 2, … Generate a realistic sequence of graphs that will obey all the patterns –Static Patterns Power Law Degree Distribution Power Law eigenvalue and eigenvector distribution Small Diameter –Dynamic Patterns Growth Power Law Shrinking/Stabilizing Diameters First and only generator for which we can prove all these properties

104 CMU SCS iCAST, Jan. 09C. Faloutsos 104 Stochastic Kronecker Graphs Create N 1  N 1 probability matrix P 1 Compute the k th Kronecker power P k For each entry p uv of P k include an edge (u,v) with probability p uv 0.40.2 0.10.3 P1P1 Instance Matrix G 2 0.160.08 0.04 0.120.020.06 0.040.020.120.06 0.010.03 0.09 PkPk flip biased coins Kronecker multiplication skip

105 CMU SCS iCAST, Jan. 09C. Faloutsos 105 Experiments How well can we match real graphs? –Arxiv: physics citations: 30,000 papers, 350,000 citations 10 years of data –U.S. Patent citation network 4 million patents, 16 million citations 37 years of data –Autonomous systems – graph of internet Single snapshot from January 2002 6,400 nodes, 26,000 edges We show both static and temporal patterns

106 CMU SCS iCAST, Jan. 09C. Faloutsos 106 (Q: how to fit the parm’s?) A: Stochastic version of Kronecker graphs + Max likelihood + Metropolis sampling [Leskovec+, ICML’07]

107 CMU SCS iCAST, Jan. 09C. Faloutsos 107 Experiments on real AS graph Degree distributionHop plot Network valueAdjacency matrix eigen values

108 CMU SCS iCAST, Jan. 09C. Faloutsos 108 Conclusions Kronecker graphs have: –All the static properties Heavy tailed degree distributions Small diameter Multinomial eigenvalues and eigenvectors –All the temporal properties Densification Power Law Shrinking/Stabilizing Diameters –We can formally prove these results


Download ppt "CMU SCS Graph Mining Christos Faloutsos CMU. CMU SCS iCAST, Jan. 09C. Faloutsos 2 Thank you! Prof. Hsing-Kuo Kenneth Pao Eric, Morgan, Ian, Teenet."

Similar presentations


Ads by Google