5Pervious Work Graph Partitioning - Minimum Cuts - Spectral PartitioningApplications:- Parallel computing- VLSI design and other CAD applications
6Pervious WorkBlock Modeling or Hierarchical Clustering or Community Structure Detection- Best fits to stochastic models- Hierarchical clustering based on single or average linkage clustering- Betweenness-based Methods
7Graph PartitioningGraph partitioning algorithms are typically based on minimum cut approaches or spectral partitioning :Graph partitioning is iterative bisection: dividing the graph into 2 groups and then subdivide each group until we have the required # of groups.
8is always eigenvector with eigenvalue 0. Spectral bisectionEigen-vectors of the graph Laplacian.L = D-AA is the adjacency matrixD is a diagonal Matrix of vertex degrees12345Diagonal = degree of each vertexDiagonal blocksTotal of rows always zerois always eigenvector with eigenvalue 0.
9Bisect !12345The eigenvector corresponding to the lowest eigenvalue must have both positive and negative elements.Will not have block diagonal so bisect
10Spectral Bisection (Cont.) It only bisects graphs into only 2 communities.Division into a larger number of communities is usually achieved by repeated bisection, but this does not always give satisfactory results.We do not in general know ahead of time how many communities we want to divide the graph into.
11Graph PartitioningMinimum cut partitioning breaks down when we don’t know the sizes of the groups- Optimizing the cut size with the groups sizes free puts all vertices in the same groupCut size is the wrong thing to optimize- A good division into communities is not just one where there are a small number of edges between groupsThere must be a smaller than expected number edges between communities
12Modularity Other Approaches:- Greedy Algorithm: Start with all the vertices in separate communities.- Find the two communities whose amalgamation gives the greatest increase in the modularitySimulated annealing ( Guimera & Amaral 2005)External Optimization(Dutch & Arenas 2005)
13Modularity (Newman and Girvan 2004) Define modularity to beQ = (number of edges within groups) – (expected number within groups).Actual Number of Edges between i and j isExpected Number of Edges between i and j isKi and Kj are degrees of verticesAnd m is the total number of edges in the networkAij = adjacency matrixQ can be +ve or –ve+ve values means presence of community structure
14Modularity Matrix So Q is a sum of over pairs (i, j) that are in the same groupOr we can write in matrix form asWhere B is a new characteristic matrix, the modularity marix,(si, sj)Where s is a the vector whose elements are siSi =1 if vertex i belongs to group 1Si=-1 if vertex i belongs to group 2Where s is a the vector whose elements are the siElements of matrix B of each of its rows and columns sum to zero, so it always has an eigenvector with eigenvalue 0
15Modularity Matrixs is the linear combination of the normalized eigenvectors ui of Bβi is the eigenvalue of B corresponding to eigenvector uiWe maximize the coefficient on the largest eigenvalue by choosing
16Modularity Matrix Algorithm Calculate the leading eigenvector of the modularity matrixDivide the vertices according to the signs of the elementsNote that there is no need to forbid the solution with all the vertices in a single group.
18Spectral properties of modularity matrix Vector(1,1,1,…) is always an eigenvector of B with eigenvalue zeroEigenvalues can either be positive or negative- So long as there is any positive eigenvalue we will never put all vertices in the same groupBut there may be no positive eigenvalues- All vertices in same group gives highest modularity- Such networks are indivisible
19Dividing into more than two groups Repeated division into two groups- Divide into two, then divide those parts into two,etcStop when there is no division that will increase the modularity- This is precisely when the subgraph is indivisible- Stop when there are no positive eigenvalues of the modularity matrix
20Modularity Matrix Time Complexity O(n2logn) Better than Betweenness Algorithm O(n3)External Optimization O(n2log2n)Not as good asGreedy Algorithm O(nlog2n) but better quality results
21Modularity Matrix Actual Running Time Collaboration network of about vertices, the algorithm takes around 20 minutes to run on a standard personal computer.
25Comparison to other methods CN = BetweennessCNM = GreedyDA = External Optimization
26SummaryModularity maximization appears to be a highly competitive approach to community detection in networksIt can be formulated as a spectral optimization problem, which leads to fast and accurate algorithmsThere are close connections between the spectrum of the modularity matrix and the community structure
27References Modularity and Community Structure in Networks – MEJ Newman Detecting community structure in network, M. E. J. Newman.Finding community structure in very large networks, Aaron Clauset, M. E. J. Newman, and Cristopher Moore.