Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.

Slides:



Advertisements
Similar presentations
Fast algorithm for detecting community structure in networks M. E. J. Newman Department of Physics and Center for the Study of Complex Systems, University.
Advertisements

Social network partition Presenter: Xiaofei Cao Partick Berg.
Clustering.
Hierarchical Clustering. Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree-like diagram that.
Modularity and community structure in networks
Community Detection Laks V.S. Lakshmanan (based on Girvan & Newman. Finding and evaluating community structure in networks. Physical Review E 69,
Community Detection and Evaluation
Graph Partitioning Dr. Frank McCown Intro to Web Science Harding University This work is licensed under Creative Commons Attribution-NonCommercial 3.0Attribution-NonCommercial.
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
Data Mining Techniques: Clustering
1 Modularity and Community Structure in Networks* Final project *Based on a paper by M.E.J Newman in PNAS 2006.
V4 Matrix algorithms and graph partitioning
Structural Inference of Hierarchies in Networks BY Yu Shuzhi 27, Mar 2014.
Author: Jie chen and Yousef Saad IEEE transactions of knowledge and data engineering.
Network Analysis Max Hinne Social Networks 6/1/20152Network Analysis.
Lecture 6 Image Segmentation
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Fast algorithm for detecting community structure in networks.
Modularity in Biological networks.  Hypothesis: Biological function are carried by discrete functional modules.  Hartwell, L.-H., Hopfield, J. J., Leibler,
Network analysis and applications Sushmita Roy BMI/CS 576 Dec 2 nd, 2014.
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
Clustering Unsupervised learning Generating “classes”
The Relative Vertex-to-Vertex Clustering Value 1 A New Criterion for the Fast Detection of Functional Modules in Protein Interaction Networks Zina Mohamed.
Community Structure in Social and Biological Network
CS 3343: Analysis of Algorithms Lecture 21: Introduction to Graphs.
Community detection algorithms: a comparative analysis Santo Fortunato.
Efficient Identification of Overlapping Communities Jeffrey Baumes Mark Goldberg Malik Magdon-Ismail Rensselaer Polytechnic Institute, Troy, NY.
Spanning Trees Introduction to Spanning Trees AQR MRS. BANKS Original Source: Prof. Roger Crawfis from Ohio State University.
Clustering Spatial Data Using Random Walk David Harel and Yehuda Koren KDD 2001.
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
7.1 and 7.2: Spanning Trees. A network is a graph that is connected –The network must be a sub-graph of the original graph (its edges must come from the.
Module 5 – Networks and Decision Mathematics Chapter 23 – Undirected Graphs.
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
Minimal Spanning Tree Problems in What is a minimal spanning tree An MST is a tree (set of edges) that connects all nodes in a graph, using.
Clustering.
1 Network Models Transportation Problem (TP) Distributing any commodity from any group of supply centers, called sources, to any group of receiving.
Efficient Computing k-Coverage Paths in Multihop Wireless Sensor Networks XuFei Mao, ShaoJie Tang, and Xiang-Yang Li Dept. of Computer Science, Illinois.
Lecture 3 1.Different centrality measures of nodes 2.Hierarchical Clustering 3.Line graphs.
Communities. Questions 1.What is a community (intuitively)? Examples and fundamental hypothesis 2.What do we really mean by communities? Basic definitions.
University at BuffaloThe State University of New York Detecting Community Structure in Networks.
Clustering Patrice Koehl Department of Biological Sciences National University of Singapore
Finding community structure in very large networks
Cluster Analysis Data Mining Experiment Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
Network Theory: Community Detection Dr. Henry Hexmoor Department of Computer Science Southern Illinois University Carbondale.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Graphs Definition: a graph is an abstract representation of a set of objects where some pairs of the objects are connected by links. The interconnected.
James Hipp Senior, Clemson University.  Graph Representation G = (V, E) V = Set of Vertices E = Set of Edges  Adjacency Matrix  No Self-Inclusion (i.
::Network Optimization:: Minimum Spanning Trees and Clustering Taufik Djatna, Dr.Eng. 1.
Clustering [Idea only, Chapter 10.1, 10.2, 10.4].
Clustering Machine Learning Unsupervised Learning K-means Optimization objective Random initialization Determining Number of Clusters Hierarchical Clustering.
A Place-based Model for the Internet Topology Xiaotao Cai Victor T.-S. Shi William Perrizo NDSU {Xiaotao.cai, Victor.shi,
GUILLOU Frederic. Outline Introduction Motivations The basic recommendation system First phase : semantic similarities Second phase : communities Application.
Graph clustering to detect network modules
Unsupervised Learning
Social Media Analytics
Hierarchical Agglomerative Clustering on graphs
Computational Molecular Biology
Clustering Patrice Koehl Department of Biological Sciences
Data Mining K-means Algorithm
Community detection in graphs
CS 3343: Analysis of Algorithms
Autumn 2016 Lecture 11 Minimum Spanning Trees (Part II)
Finding modules on graphs
Michael L. Nelson CS 495/595 Old Dominion University
Text Categorization Berlin Chen 2003 Reference:
Unsupervised Learning
Presentation transcript:

Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology

2 Agenda Introduction to Social Network and Community Discovery Classical Community Discovery Algorithms Hot Research Issues

3 Introduction to Social Network and Community Discovery

Studies on Networks Lots of “Networked” data!! Technological networks Power-grid, road networks Biological networks Food-web, protein networks Social networks Collaboration networks, friendships Language networks Semantic networks

Studies on Networks Social Networks QQ Kaixin Renren Facebook Twitter Co-citation Blog

Community A property that seems to be common to many networks is community structure. Community: The division of network nodes into groups within which the network connections are dense, but between which they are sparser.

Subjectivity of Community Definition Each component is a community A densely-knit community Definition of a community can be subjective. (unsupervised learning) Definition of a community can be subjective. (unsupervised learning)

Community Detection Community Detection: Find the community structure from the social network. Community detection is important: Identifying modules and their boundaries allows for a classification of vertices, according to their structural position in the modules.

Community Detection Public opinions monitor Commodity recommendation Network optimization Network security Epidemic monitor

10 Classical Community Discovery Algorithms

Clustering based on Vertex Similarity Apply k-means or similarity-based clustering to nodes Vertex similarity is defined in terms of the similarity of their neighborhood Structural equivalence: two nodes are structurally equivalent iff they are connecting to the same set of actors Structural equivalence is too restrict for practical use. Nodes 1 and 3 are structurally equivalent; So are nodes 5 and 6.

Vertex Similarity Jaccard Similarity Cosine similarity

13 Linkage Clustering The illustration of three cluster-to-cluster dissimilarity criteria. R and S are two clusters and N R ; N S are the sizes of these two clusters. r i  R and s j  S are the ith and jth object in cluster R and S respectively.

Greedy on Similarity Merge the pair of which the distance is minimum (i.e. most similar) The number of partitions found during the procedure is n, each with a different number of clusters, from n to 1. At each iteration step, one needs to compute the variation Q of modularity given by the merger of any two communities of the running partition, so that one can choose the best merger.

CNM algorithm Clauset, Newman, and Moore (CNM algorithm) Finding community structure in very large networks Finding community structure in very large networks A Clauset, MEJ Newman, C Moore - Physical Review E 2004 cited times: 351 The idea of CNM is based on the greedy optimization of the quantity known as modularity CNM is a agglomerative hierarchical method

Modularity Maximization Modularity measures the strength of a community partition by taking into account the degree distribution Given a network with m edges, the expected number of edges between two nodes with degrees d i and d j is Strength of a community: Modularity: A larger value indicates a good community structure The expected number of edges between nodes 1 and 2 is 3*2/ (2*14) = 3/14 Given the degree distribution

CNM We view every single node as a community initially. We repeatedly join together the two communities whose amalgamation produces the largest increase in Q. For a network of n vertices, after n − 1 such joins we are left with a single community and the algorithm stops. The entire process can be represented as a tree whose leaves are the vertices of the original network and whose internal nodes correspond to the joins.

CNM Dendrogram represents a hierarchical decomposition of the network into communitiesat all levels.

CNM algorithm It is observed that merging communities of unbalanced sizes has great impact on computational efficiency of CNM.

Results

Girvan and Newman Method Among the hierarchical methods, the algorithm of Girvan and Newman (Girvan & Newman 2002) presents an important improvement. Community structure in social and biological networks M Girvan, MEJ Newman - Proceedings of the National Academy of Sciences, National Acad Sciences cited times : 1302 : 1302 GN method is a divisive hierarchical method.

Edge Betweenness The strength of a tie can be measured by edge betweenness Edge betweenness: the number of shortest paths that pass along with the edge The edge betweenness of e(1, 2) is 4 (=6/2 + 1), as all the shortest paths from 2 to {4, 5, 6, 7, 8, 9} have to either pass e(1, 2) or e(2, 3), and e(1,2) is the shortest path between 1 and 2 22

Edge Betweenness  They use the metric called edge betweenness where betweenness is some measure that favors edges that lie between communities and disfavors those that lie inside communities.

Edge Betweenness Define the edge betweenness of an edge as the number of shortest paths between pairs of vertices that run along it. If there more than one shortest path between a pair of vertices each path is given equal weight such that the total weigh of all the paths is unity. If a network contains communities or groups that are only loosely connected by a few inter-group edges, then all shortest paths between different communities must go along one of these few edges.

Edge Betweenness Thus, the edges connecting communities will have high edge betweenness. By removing these edges, we separate groups from one another and so reveal the underlying community structure of the graph.

Procedure The algorithm is stated as follows: 1. Calculate the betweenness for all edges in the network. 2. Remove the edge with the highest betweenness. 3. Recalculate betweennesses for all edges excepted by the removal. 4. Repeat from step 2 until no edges remain

Divisive clustering based on edge betweenness After remove e(4,5), the betweenness of e(4, 6) becomes 20, which is the highest; After remove e(4,6), the edge e(7,9) has the highest betweenness value 4, and should be removed. Initial betweenness value 27 Idea: progressively removing edges with the highest betweenness

Procedure

31 Hot Directions

Discovery of Overlapping Communities Incremental algorithm Topic-sensitive Community Discovery Local Community Discovery Community Discovery in Multi-relational Network

Q&A Thanks!