Presentation is loading. Please wait.

Presentation is loading. Please wait.

Clustering Social Networks

Similar presentations


Presentation on theme: "Clustering Social Networks"— Presentation transcript:

1 Clustering Social Networks
Nina Mishra et al Presented by Nam Nguyen

2 (α,β)-Cluster Definition
Given a graph G = (V,E) where every vertex has a self-loop, C ⊂ V is an (α,β)-cluster if 1. Internally dense: ∀v ∈ V, |E(v,C)| ≥ β|C| 2. Externally sparse: ∀u ∈ V\C, |E(u,C)| ≤ α|C| u ≥ β|C| ≤ α|C| v

3 Example {a,b,c,d} and {d,e,f,g} are (1/4, 1)-clusters
h and i are do not fall into any (α,β)-cluster for 0≤ α< ½ < β ≤1 thus, they would not be clustered.  (α,β)-cluster are able for detecting overlapping clusters.

4 Problem definition Objective Identify clusters that are internally dense, i.e., each vertex in the cluster is adjacent to at least a β-fraction of the cluster, and externally sparse, i.e., any vertex outside of the cluster is adjacent to at most an α– fraction of the vertices in the cluster. Given 0≤ α< β ≤1, find all (α,β)-clusters in the network

5 Contributions of the paper
Give a bound for the overlapping of two (α,β)- clusters A and B. They overlap in at most |C|*min{1-(β- α), α/(2β-1)} vertices. If the ratio of |A| and |B| is at most (1- α)/(1- β) then one cluster can not be contained in the other. Give a loose upper bound for the number of (α,1)- clusters of size s: O( (n/s) α+1 ) Introduction of the ρ-champion of a cluster and if β> ½(1+ ρ+ α), there is a simple deterministic algorithm for finding all such clusters in time O(m0.7n1.2 + n2+o(1))

6 Some minor remarks β  1, the cluster C  a clique
α  0, C tends to a disconnected component β< ½ then C might contain two disconnected components. We want α < β and β> ½. (0, β)-clusters  finding connected components & output β-connected ones. (1-1/n, 1)-clusters  finding the maximal cliques in a graph. ((1-ε) β, β)-clusters  finding quasi-cliques.

7 Result 1 Question: How about the intersection of 3 (or more) (α,β)-clusters of the same size? different size ? How about the intersection of an (α,β)-cluster and an (α’,β’)-cluster of the same size? different size ?

8 Result 2: Bounding the number of (α,1)-clusters
Proof Two clusters of the same size s can share at most αs vertices. Every subset of size (αs+1) must appear in at most one set in C. There are subsets of s elements from n elements, each of these contains subsets of size (αs+1). Therefore, we can have at most clusters in C  |C| ≤ =

9 This bound is tight … when α = 0 when α  1 ( α = (n-1)/n )
No overlapping  # of clusters of size s = n/s. when α  1 ( α = (n-1)/n ) Consider the complement of the following graph Let s = n = N/2, then the bound is 2n. In fact, we do have 2n subsets of (α, 1)-clusters of size n by choosing from the set B = {b1b2…bn | bi is either xi or yi}

10 An algorithm for finding clusters with champions
Why? In last example, each vertex has as many neighbors outside as within the cluster There is no vertex that “champions” the cluster (having more friends inside than outside) Why not find one who champions and start with it?

11 Algorithm (cont’d) Assumption: Why?
A big gap between β and α/2: β > ½ + (α+ρ)/2 Why? Recall last example: We have 2n possible clusters of size n  Too many Any algorithm that outputs more clusters than nodes are undesirable. Thus, we need some restriction to reduce the # of returned clusters.

12 Algorithm (cont’d) How many clusters with ρ-champion should we have ?
A big gap between β and α/2: β > ½ + (α+ρ)/2 How to find them?

13 Algorithm (cont’d) If v and c have sufficient many neighbors then v is a part of the cluster C that c champions.  that’s what line #5 for Running time of the algorithm

14 Experimental Results For real networks Results Datasets
Do (α,β)-clusters with ρ-champion exist?  use Tsukiayama algorithm If they do exist, do most (α,β)-clusters have ρ-champion? Results Able to find ~90% of the maximal cliques in graphs where α ≤ ½. No strong ρ-champions in missed clusters. Running time: Weight faster than Tsukiyama’s algorithm Datasets High Energy Physics Theory Co-Author graph (HEP) Theory Co-Author graph (TA) A subset of Live Journal graph (LP)

15 Results

16 Results

17 Results

18 Results

19 References [1] Clustering Social Networks, Ninna Mishra, Robert Schreiber, Isabelle Stanton and Robert E. Tarjan (2007)


Download ppt "Clustering Social Networks"

Similar presentations


Ads by Google