Presentation is loading. Please wait.

Presentation is loading. Please wait.

Jure Leskovec Computer Science Department Cornell University / Stanford University Joint work with: Jon Kleinberg (Cornell), Christos.

Similar presentations


Presentation on theme: "Jure Leskovec Computer Science Department Cornell University / Stanford University Joint work with: Jon Kleinberg (Cornell), Christos."— Presentation transcript:

1 Jure Leskovec (jure@cs.stanford.edu) Computer Science Department Cornell University / Stanford University Joint work with: Jon Kleinberg (Cornell), Christos Faloutsos (CMU), Michael Mahoney (Stanford), Kevin Lang (Yahoo), Anirban Dasgupta (Yahoo)

2  Large on-line computing applications have detailed records of human activity:  On-line communities: Facebook (120 million)  Communication: Instant Messenger (~1 billion)  News and Social media: Blogging (250 million)  We model the data as a network (an interaction graph) Can observe and study phenomena at scales not possible before Communication network

3  Community (cluster) structure of networks 3 Collaborations in NetSci (N=380) Tiny part of a large social network What is the structure of the network? How can we model that?

4 Conductance (normalized cut):  How community like is a set of nodes?  Idea: Use approximation algorithms for NP-hard graph partitioning problems as experimental probes of network structure.  Small Φ(S) == more community-like sets of nodes S S’ 4 [w/ Mahoney, Lang, Dasgupta, WWW ’08]

5  We define: Network community profile (NCP) plot Plot the score of best community of size k 5 Community size, log k log Φ(k) Φ(5)=0.25 Φ(7)=0.18 k=5 k=7 [w/ Mahoney, Lang, Dasgupta, WWW ’08]

6  Collaborations between scientists in Networks [Newman, 2005] 6 Community size, log k Conductance, log Φ(k) [w/ Mahoney, Lang, Dasgupta, WWW ’08]

7  Typical example: General relativity collaboration network (4,158 nodes, 13,422 edges) 7 [w/ Mahoney, Lang, Dasgupta, WWW ’08]

8 8

9 Φ(k), (conductance) k, (community size) Better and better communities Communities get worse and worse Best community has ~100 nodes 9 [w/ Mahoney, Lang, Dasgupta, WWW ’08]

10  Each dot is a different network 10 Practically constant! [w/ Mahoney, Lang, Dasgupta, WWW ’08]

11 Core-periphery (jellyfish, octopus) Small good communities Denser and denser core of the network Core contains ~60% nodes and ~80% edges 11 So, what’s a good model?

12  Kronecker product of matrices A and B is given by  We define a Kronecker product of two graphs as a Kronecker product of their adjacency matrices N x MK x L N*K x M*L 12 [w/ Chakrabarti-Kleinberg-Faloutsos, PKDD ’05]

13  Kronecker graph: a growing sequence of graphs by iterating the Kronecker product  Each Kronecker multiplication exponentially increases the size of the graph  One can easily use multiple initiator matrices ( G 1 ’, G 1 ’’, G 1 ’’’ ) that can be of different sizes 13 [w/ Chakrabarti-Kleinberg-Faloutsos, PKDD ’05]

14  Kronecker graphs mimic real networks:  Theorem: Power-law degree distribution, Densification, Shrinking/stabilizing diameter, Spectral properties Initiator (9x9) (3x3) (27x27) 14 p ij Edge probability Starting intuition: Recursion & self-similarity [w/ Chakrabarti, Kleinberg, Faloutsos, PKDD ’05]

15 15

16  Initiator matrix G 1 is a similarity matrix  Node u is described with k binary attributes: u 1, u 2,…, u k  Probability of a link between nodes u, v: P(u,v) = ∏ G 1 [u i, v i ] 16 ab cd ab cd ab cd v u = (0,1,1,0) P(u,v) = b∙d∙c∙b 0 1 0 1 v = (1,1,0,1) u Given a real graph. How to estimate the initiator G 1 ?

17  Want to generate realistic networks: How to estimate initiator matrix:  Method of moments [Owen ‘09] :  Compare counts of subgraphs and solve  Maximum likelihood [Leskovec&Faloutsos, ’07] :  arg max P( | G 1 )  SVD [VanLoan&Pitsianis ‘93] :  Can solve using SVD 17 Compare graphs properties, e.g., degree distribution Given a real network Generate a synthetic network ab cd

18  What do estimated parameters tell us about the network structure? 18 [w/ Dasgupta-Lang-Mahoney, WWW ’08] ab cd a edges d edges b edges c edges

19  What do estimated parameters tell us about the network structure? 19 Core 0.9 edges Periphery 0.1 edges 0.5 edges Core-periphery (jellyfish, octopus) [w/ Dasgupta-Lang-Mahoney, WWW ’08] 0.90.5 0.1

20  Small and large networks are very different: 20 Collaboration network (N=4,158, E=13,422) Scientific collaborations (N=397, E=914) 0.990.54 0.490.13 0.990.17 0.82 G 1 =

21  Computational tools as probes into the structure of large networks  Community structure of large networks:  Core-periphery structure  Scale to natural community size: Dunbar number  Model: Kronecker graphs  Analytically tractable: provable properties  Can efficiently estimate parameters from data  Implications:  No large clusters: no/little hierarchical structure  Can’t be well embedded – no underlying geometry 21

22  Why are networks the way they are?  Only recently have basic properties been observed on a large scale  Confirms social science intuitions; calls others into question  What are good tractable network models?  Builds intuition and understanding  Benefits of working with large data  Observe structures not visible at smaller scales 22

23

24  Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations, by J. Leskovec, J. Kleinberg, C. Faloutsos, KDD 2005  Realistic, Mathematically Tractable Graph Generation and Evolution, Using Kronecker Multiplication, by J. Leskovec, D. Chakrabarti, J. Kleinberg and C. Faloutsos, PKDD 2005  Scalable Modeling of Real Graphs using Kronecker Multiplication, by J. Leskovec and C. Faloutsos, ICML 2007  Statistical Properties of Community Structure in Large Social and Information Networks, by J. Leskovec, K. Lang, A. Dasgupta, M. Mahoney, WWW 2008  Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters, by J. Leskovec, K. Lang, A. Dasgupta, M. Mahoney, Arxiv 2008 24


Download ppt "Jure Leskovec Computer Science Department Cornell University / Stanford University Joint work with: Jon Kleinberg (Cornell), Christos."

Similar presentations


Ads by Google