# Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research.

## Presentation on theme: "Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research."— Presentation transcript:

Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research

 Communities:  Sets of nodes with lots of connections inside and few to outside (the rest of the network)  Assumption:  Networks are (hierarchically) composed of communities (modules) Communities, clusters, groups, modules

 How community like is a set of nodes?  Want a measure that corresponds to intuition  Conductance (normalized cut): Φ(S) = # edges cut / # edges inside  Small Φ(S) corresponds to more community-like sets of nodes

Score: Φ(S) = # edges cut / # edges inside

Score: Φ(S) = # edges cut / # edges inside Better community Φ=5/7 = 0.7 Bad community Φ=2/5 = 0.4

Score: Φ(S) = # edges cut / # edges inside Better community Φ=5/7 = 0.7 Bad community Φ=2/5 = 0.4 Best community Φ=2/8 = 0.25

 We define: Network community profile (NCP) plot Plot the score of best community of size k  Search over all subsets of size k and find best: Φ(k=5) = 0.25  NCP plot is intractable to compute  Use approximation algorithm

 Dolphin social network  Two communities of dolphins NCP plot Network

 Zachary’s university karate club social network  During the study club split into 2  The split (squares vs. circles) corresponds to cut B NCP plotNetwork

 Collaborations between scientists in Networks NCP plotNetwork

Hierarchical network Geometric (grid-like) network – Small social networks – Geometric and – Hierarchical network have downward NCP plot

 Previously researchers examined community structure of small networks (~100 nodes)  We examined more than 70 different large social and information networks Large real-world networks look completely different!

 Typical example: General relativity collaboration network (4,158 nodes, 13,422 edges)

Better and better communities Worse and worse communities Best community has 100 nodes Community score Community size

 Whiskers are responsible for downward slope of NCP plot Whisker is a set of nodes connected to the network by a single edge NCP plot Largest whisker

 Each new edge inside the community costs more NCP plot Φ=2/1 = 2 Φ=8/3 = 2.6 Φ=64/11 = 5.8 Each node has twice as many children

Network structure: Core-periphery, jellyfish, octopus Whiskers are responsible for good communities Denser and denser core of the network Core contains 60% node and 80% edges

What if we allow cuts that give disconnected communities? Cut all whiskers and compose communities out of them

Community score Community size We get better community scores when composing disconnected sets of whiskers Connected communities Bag of whiskers

Rewired network: random network with same degree distribution

 What is a good model that explains such network structure?  None of the existing models work Pref. attachment Small World Geometric Pref. Attachment Flat Down and Flat Flat and Down

 Forest Fire: connections spread like a fire  New node joins the network  Selects a seed node  Connects to some of its neighbors  Continue recursively As community grows it blends into the core of the network

rewired network Bag of whiskers

 Whiskers:  Largest whisker has ~100 nodes  Independent of network size  Dunbar number: a person can maintain social relationship to 150 people  Bond vs. identity communites  Core:  Newman et al. analyzed 400k node product network ▪ Largest community has 50% nodes ▪ Community was labeled “miscelaneous”

 NCP plot is a way to analyze network community structure  Our results agree with previous work on small networks  But large networks are fundamentally different  Large networks have core-periphery structure  Small well isolated communities blend into the core of the networks as they grow

Download ppt "Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research."

Similar presentations