Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.

Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology

2 Agenda Introduction to Social Network and Community Discovery Classical Community Discovery Algorithms Hot Research Issues

3 Introduction to Social Network and Community Discovery

Studies on Networks Lots of “Networked” data!! Technological networks Power-grid, road networks Biological networks Food-web, protein networks Social networks Collaboration networks, friendships Language networks Semantic networks

Studies on Networks Social Networks QQ Kaixin Renren Facebook Email Twitter Co-citation Blog

Community A property that seems to be common to many networks is community structure. Community: The division of network nodes into groups within which the network connections are dense, but between which they are sparser.

Subjectivity of Community Definition Each component is a community A densely-knit community Definition of a community can be subjective. (unsupervised learning) Definition of a community can be subjective. (unsupervised learning)

Community Detection Community Detection: Find the community structure from the social network. Community detection is important: Identifying modules and their boundaries allows for a classification of vertices, according to their structural position in the modules.

Community Detection Public opinions monitor Commodity recommendation Network optimization Network security Epidemic monitor

10 Classical Community Discovery Algorithms

Clustering based on Vertex Similarity Apply k-means or similarity-based clustering to nodes Vertex similarity is defined in terms of the similarity of their neighborhood Structural equivalence: two nodes are structurally equivalent iff they are connecting to the same set of actors Structural equivalence is too restrict for practical use. Nodes 1 and 3 are structurally equivalent; So are nodes 5 and 6.

Vertex Similarity Jaccard Similarity Cosine similarity

13 Linkage Clustering The illustration of three cluster-to-cluster dissimilarity criteria. R and S are two clusters and N R ; N S are the sizes of these two clusters. r i  R and s j  S are the ith and jth object in cluster R and S respectively.

Greedy on Similarity Merge the pair of which the distance is minimum (i.e. most similar) The number of partitions found during the procedure is n, each with a different number of clusters, from n to 1. At each iteration step, one needs to compute the variation Q of modularity given by the merger of any two communities of the running partition, so that one can choose the best merger.

CNM algorithm Clauset, Newman, and Moore (CNM algorithm) Finding community structure in very large networks Finding community structure in very large networks A Clauset, MEJ Newman, C Moore - Physical Review E 2004 cited times: 351 The idea of CNM is based on the greedy optimization of the quantity known as modularity CNM is a agglomerative hierarchical method

Modularity Maximization Modularity measures the strength of a community partition by taking into account the degree distribution Given a network with m edges, the expected number of edges between two nodes with degrees d i and d j is Strength of a community: Modularity: A larger value indicates a good community structure The expected number of edges between nodes 1 and 2 is 3*2/ (2*14) = 3/14 Given the degree distribution

CNM We view every single node as a community initially. We repeatedly join together the two communities whose amalgamation produces the largest increase in Q. For a network of n vertices, after n − 1 such joins we are left with a single community and the algorithm stops. The entire process can be represented as a tree whose leaves are the vertices of the original network and whose internal nodes correspond to the joins.

CNM Dendrogram represents a hierarchical decomposition of the network into communitiesat all levels.

CNM algorithm It is observed that merging communities of unbalanced sizes has great impact on computational efficiency of CNM.

Results

Girvan and Newman Method Among the hierarchical methods, the algorithm of Girvan and Newman (Girvan & Newman 2002) presents an important improvement. Community structure in social and biological networks M Girvan, MEJ Newman - Proceedings of the National Academy of Sciences, 2002 - National Acad Sciences cited times ： 1302 ： 1302 GN method is a divisive hierarchical method.

Edge Betweenness The strength of a tie can be measured by edge betweenness Edge betweenness: the number of shortest paths that pass along with the edge The edge betweenness of e(1, 2) is 4 (=6/2 + 1), as all the shortest paths from 2 to {4, 5, 6, 7, 8, 9} have to either pass e(1, 2) or e(2, 3), and e(1,2) is the shortest path between 1 and 2 22

Edge Betweenness  They use the metric called edge betweenness where betweenness is some measure that favors edges that lie between communities and disfavors those that lie inside communities.

Edge Betweenness Define the edge betweenness of an edge as the number of shortest paths between pairs of vertices that run along it. If there more than one shortest path between a pair of vertices each path is given equal weight such that the total weigh of all the paths is unity. If a network contains communities or groups that are only loosely connected by a few inter-group edges, then all shortest paths between different communities must go along one of these few edges.

Edge Betweenness Thus, the edges connecting communities will have high edge betweenness. By removing these edges, we separate groups from one another and so reveal the underlying community structure of the graph.

Procedure The algorithm is stated as follows: 1. Calculate the betweenness for all edges in the network. 2. Remove the edge with the highest betweenness. 3. Recalculate betweennesses for all edges excepted by the removal. 4. Repeat from step 2 until no edges remain

Divisive clustering based on edge betweenness After remove e(4,5), the betweenness of e(4, 6) becomes 20, which is the highest; After remove e(4,6), the edge e(7,9) has the highest betweenness value 4, and should be removed. Initial betweenness value 27 Idea: progressively removing edges with the highest betweenness

Procedure

31 Hot Directions

Discovery of Overlapping Communities Incremental algorithm Topic-sensitive Community Discovery Local Community Discovery Community Discovery in Multi-relational Network

Q&A Thanks!

Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.

Similar presentations

Presentation on theme: "Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.

Similar presentations

Presentation on theme: "Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology."— Presentation transcript:

Similar presentations

About project

Feedback