Download presentation

Presentation is loading. Please wait.

Published bySilvester Ross Modified over 4 years ago

1
Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta, Michael Mahoney Yahoo! Research

2
Big data Study emerging behaviors How are small networks different from large 2

3
Communities (groups, clusters, modules): Sets of nodes with lots of connections inside and few to outside (the rest of the network) 3 Communities, clusters, groups, modules

4
Nodes represent proteins Edges represent interactions/associations Proteins with same function interact more Can use network to discover functional groups 4 Yeast transcriptional regulatory modules [Bar-Joseph et al., 2003]

5
Clusters correspond to social communities, organizational units (e.g., departments) 5 Zachary’s Karate club network During the study the club split into 2 The split corresponds to min-cut ( ● vs. ■ )

6
6 [Adamic-Glance 2005] Democrat vs. Republican blogs

7
7 Citations Collaborations [Newman 2003]

8
Nested communities: modular structure of networks is hierarchically organized 8 CS Math DramaMusic Science Arts University

9
Recursive hierarchical network 9 (a) N=5, E=8 (b) N=25, E=56 (c) N=125, E=344

10
Intuition: Find nodes that can be easily separated from the rest of the network Various objective functions Min-cut Normalized-cut Centrality, Modularity Various algorithms Spectral clustering (random walks) Girvan-Newman (centrality) Metis (contraction based) 10 Girvan-Newman: 1) Betweenness centrality: number of shortest paths passing through an edge. 2) Remove edges by decreasing centrality

11
11

12
Statistical properties of community structure Instead of searching for communities we measure well how expressed are communities Questions What is the community structure of real world networks? How to measure and quantify this? What does this tell us about network structure? What is a good model (intuition)? What are consequences for clustering/partitioning algorithms? 12

13
How community like is a set of nodes? Need a natural intuitive measure Conductance (normalized cut) Φ(S) = # edges cut / # edges inside Small Φ(S) corresponds to more community-like sets of nodes S S’ 13

14
Score: Φ(S) = # edges cut / # edges inside What is “best” community of 5 nodes? 14

15
Score: Φ(S) = # edges cut / # edges inside Bad community Φ=5/6 = 0.83 What is “best” community of 5 nodes? 15

16
Score: Φ(S) = # edges cut / # edges inside Better community Φ=5/7 = 0.7 Bad community Φ=2/5 = 0.4 What is “best” community of 5 nodes? 16

17
Score: Φ(S) = # edges cut / # edges inside Better community Φ=5/7 = 0.7 Bad community Φ=2/5 = 0.4 Best community Φ=2/8 = 0.25 What is “best” community of 5 nodes? 17

18
We define: Network community profile (NCP) plot Plot the score of best community of size k Search over all subsets of size k and find best: Φ(k=5) = 0.25 NCP plot is intractable to compute 18

19
We define: Network community profile (NCP) plot Plot the score of best community of size k 19 Community size, log k log Φ(k) k=5, Φ(k)=0.25 k=7, Φ(k)=0.18

20
20 Community size, log k Community score, log Φ(k)

21
Local spectral clustering algorithm Pick a seed node Slowly diffuse mass around it (via PageRank like random walk) Find the bottleneck Repeat many times Many seed nodes for very local walks Less seed nodes for more global (longer) walks 21

22
22

23
Dolphin social network Two communities of dolphins NCP plot Network 23

24
Zachary’s university karate club social network During the study club split into 2 The split (squares vs. circles) corresponds to cut B NCP plotNetwork 24

25
Collaborations between scientists in Networks NCP plotNetwork 25

26
26 NCP plot Network

27
27 NCP plot Network

28
Manifold learning dataset (Hands) 28 NCP plot Network

29
Eastern US power grid: 29

30
30 NCP plot Network – Small social networks – Geometric and – Hierarchical network have downward NCP plot What about large networks?

31
31

32
Previously researchers examined community structure of small networks (~100 nodes) We examined more than 70 different large networks Large real-world networks look very different! 32

33
Typical example: General relativity collaboration network (4,158 nodes, 13,422 edges) 33

34
Community score Community size Better and better communities Best communities get worse and worse Best community has 100 nodes 34

35
Whiskers are responsible for downward slope of NCP plot Whisker is a set of nodes connected to the network by a single edge NCP plot Largest whisker 35

36
Each new edge inside the community costs more NCP plot Φ=2/4 = 0.5 Φ=8/6 = 1.3 Φ=64/14 = 4.5 Each node has twice as many children Φ=1/3 = 0.33 36

37
Take a real network G Rewire edges for a long time We obtain a random graph with same degree distribution as the real network G 37

38
38 Rewired network: random network with same degree distribution

39
39 Whiskers in real networks are larger than expected

40
40 Whiskers in real networks are non-trivial (richer than trees) Edge to cut

41
What if we allow cuts that give disconnected communities? Cut all whiskers Compose communities out of whiskers How good “communities” do we get? 41

42
Community score Community size We get better community scores when composing disconnected sets of whiskers Connected communities Bag of whiskers 42

43
43 Nothing happens! Now we have 2-edge connected whiskers to deal with.

44
44 Connected communities Bag of whiskers Rewired network

45
Network structure: Core-periphery (jellyfish, octopus) Whiskers are responsible for good communities Denser and denser core of the network Core contains 60% node and 80% edges 45

46
46

47
(Sparse) Random graph: Start with N nodes Pick pairs of nodes uniformly at random and connect 47 Flat (long random connections) Theorem (works for any degree distribution) Sparsity does not explain our observation

48
48 Preferential attachment [Price 1965, Albert & Barabasi 1999]: Add a new node, create m out-links Probability of linking a node k i is proportional to its degree Based on Herbert Simon’s result Power-laws arise from “Rich get richer” (cumulative advantage) Flat (connections to hubs – no locality)

49
Let’s exploit local connections 49 Down (locally network looks like a mesh) and Flat (at large scale network looks random)

50
Geometric preferential attachment: Place nodes at random in 2D Pick a node Pick nodes in a radius Connect preferentially 50 Flat (locally network is random) and Down (globally network is a mesh – union of local expanders)

51
Forest Fire: connections spread like a fire New node joins the network Selects a seed node Connects to some of its neighbors Continue recursively As community grows it blends into the core of the network 51

52
rewired network Bag of whiskers 52

53
Whiskers: Largest whisker has ~100 nodes Independent of network size Dunbar number: a person can maintain social relationship to at most 150 people Core: Core has little structure (hard to cut) Still more structure than the random network 53

54
Other researchers examined small networks so they did not hit the Dunbar’s limit Small evidence: 400k nodes Amazon co-purchasing network [Clauset et al. 2004] ▪ Largest community has 50% of all nodes ▪ It was labeled “Miscelaneous” Karate club has no significant community structure [Newman et al. 2007] 54

55
Bond vs. identity communities Multiple hierarchies that blur the community boundaries 55

56
Ground truth Yes, use attributes, better link semantics 56

57
NCP plot is a way to analyze network community structure Our results agree with previous work on small networks (that are commonly used for testing community finding algorithms) But large networks are different Large networks Whiskers + Core structure Small well isolated communities blend into the core of the networks as they grow 57

Similar presentations

© 2019 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google