# An Efficient Algorithm Based on Self-adapted Fuzzy C-Means Clustering for Detecting Communities in Complex Networks Jianzhi Jin 1, Yuhua Liu 1, Kaihua.

## Presentation on theme: "An Efficient Algorithm Based on Self-adapted Fuzzy C-Means Clustering for Detecting Communities in Complex Networks Jianzhi Jin 1, Yuhua Liu 1, Kaihua."— Presentation transcript:

An Efficient Algorithm Based on Self-adapted Fuzzy C-Means Clustering for Detecting Communities in Complex Networks Jianzhi Jin 1, Yuhua Liu 1, Kaihua Xu 2,Fang Hu 1 1 Department of Computer Science, HuaZhong Normal University Wuhan, 430079, China 2 College of Physical Science and Technology, HuaZhong Normal University Wuhan, 430079, China Email: yhliu@mail.ccnu.edu.cn 2011.12.16

Outline Introduction Self-adapted Fuzzy C-Means Clustering in Complex Networks Simulations and Analysis Conclusion and Future Works

Introduction(1/6) Many complex networked systems are found to divide naturally into modules or communities, groups of vertices with relatively dense connections within groups but sparser connections between them. Detecting Communities can provide invaluable help in understanding and visualizing the structure of networks

Introduction(2/6) Detecting Communities  Requirements: High efficiency and high accuracy Be based on sound theoretical principles Not allowed to be any cut-node or cut-link

Introduction(3/6) Detecting Communities  Validation Metrics Modularity Accuracy Density

Introduction(4/6) FCM in Complex Networks  Have been applied to detecting communities in recent years  The mainstream algorithm—AFCM, CFCM and NFCM etc.  All use the different variants of Laplacian matrix of the graph

Introduction(5/6) FCM in Complex Networks  Laplacian matrix (N=D-A) is used in AFCM  N=D -1 A is used in CFCM, and N=D -1/2 (D-A) D -1/2 is used in NFCM. D is the diagonal matrix consisting by the degree of all nodes in the whole network, and A is the adjacency matrix of the network.

Introduction(6/6) FCM in Complex Networks  Better clustering accuracy and running efficiency  The synthetic performance is well  Two deficiencies Cannot find the number of clusters to be explored voluntarily Easy to get stuck in a local extremum

Self-adapted Fuzzy C-Means Clustering in Complex Networks(1/5) SFCM in Complex Networks  A new algorithm based on FCM to detecting communities----Self-adapted FCM. Constructing a new validity function to find an optimal number of clusters voluntarily.

Self-adapted Fuzzy C-Means Clustering in Complex Networks(2/5) A New Validity Function  The inter-cluster distances should be as bigger as possible  The intra-cluster distances should be as smaller as possible.

Self-adapted Fuzzy C-Means Clustering in Complex Networks(3/5) Steps of the Algorithm  Step 1 Initialization : termination condition, cluster number,,.  Step 2 The partition matrix was constructed. If there exist j and r, so that, then and for.

Self-adapted Fuzzy C-Means Clustering in Complex Networks(4/5) Steps of the Algorithm  Step 3 The prototypes was calculated.  Step 4 If Then stop the iteration, else let,and go to Step 2.

Self-adapted Fuzzy C-Means Clustering in Complex Networks(5/5) Steps of the Algorithm  Step 5 was calculated under. If is the highest values, then stop the algorithm, else go to Step 2 with. Deficiency  The computable complexity is O(n 3 ).

Simulations and Analysis(1/7) Zachary’s Karate Club Network of American Football Games Tests on Computer-generated Networks

Simulations and Analysis(2/7) Zachary’s Karate Club Square nodes and circle nodes represent the instructor’s faction and the administrator’s faction, respectively. The squares also split into two communities, which are identified by blue and green, in accordance with the circles which are identified by red and yellow.

Simulations and Analysis(3/7) Zachary’s Karate Club  Modularity of all are not high.  Modularity in AFCM is declined substantially.  Modularity in CFCM is lower than NFCM and SFCM. AlgorithmCommunitiesModularityDensity AFCM40.0524330.628205 CFCM40.2260030.730769 NFCM40.2273180.730769 SFCM40.2273180.730769

Simulations and Analysis(4/7) Network of American Football Games The algorithm can find ten communities, which contain ten conferences almost exactly voluntarily. A total of 11 nodes are unclassified or misclassified, with a red circle marked, and its Accuracy is 90.43%.

Simulations and Analysis(5/7) Network of American Football Games  The modularity calculated by SFCM is higher than others, so does the density. Likewise, the community number of the first three algorithms is pre-specified. AlgorithmCommunitiesModularityDensity AFCM100.4953570.674029 CFCM100.4954420.673915 NFCM100.4947950.673475 SFCM100.4980770.675367

Simulations and Analysis(6/7) Tests on Computer-generated Networks  RN(c, m, k, p) Where c is the number of communities in the network, m is the number of nodes in each community, k is the degree of each node, and p is the density we presented.

Simulations and Analysis(7/7) Tests on Computer-generated Networks  p is increasing from 0 to 1, the community structure in the network becomes more cohesive.  All algorithms can correctly cluster all the nodes when p was no less than 0.5.  In the range of, the accuracy of SFCM is better than others.

Conclusion and Future Works A new validity function is defined in this algorithm to find an optimal cluster number voluntarily. The simulation results verify that the algorithm is more complete and accurate The higher computable complexity will influence its performance in the end In a further research, we will focus on improving the computability and complexity with less loss of precision, and getting the global optimal solution.