Presentation is loading. Please wait.

Presentation is loading. Please wait.

Community Detection based on Distance Dynamics Reporter: Yi Liu Student ID: 015033910017 Department of Computer Science and Engineering Shanghai Jiao Tong.

Similar presentations


Presentation on theme: "Community Detection based on Distance Dynamics Reporter: Yi Liu Student ID: 015033910017 Department of Computer Science and Engineering Shanghai Jiao Tong."— Presentation transcript:

1 Community Detection based on Distance Dynamics Reporter: Yi Liu Student ID: 015033910017 Department of Computer Science and Engineering Shanghai Jiao Tong University

2 Yi Liu SJTU 2015/12/03 Outline We consider the community detection problem from a new point of view: distance dynamics. Background and challenges Basic idea and Attractor algorithm Experimental evaluation Conclusion Problem Statement Existing Solutions and Their Drawbacks

3 Yi Liu SJTU 2015/12/03 Social network is a hot research topic in the area of Internet of Things. Background Community Detection is a complex and meaningful process in social network. Resent years detecting community structure of networks has won widely attention. Network of Sensors Food Chain World-Wide-Web The Urban Traffic Network

4 Yi Liu SJTU 2015/12/03 Background Papadopoulos S etc.Data Mining and Knowledge Discovery, 2012, 24(3): 515-554. How can we find intrinsic community structure in networks?

5 Yi Liu SJTU 2015/12/03 What is community? Three Communities Global Opinion 1 Opinion 2 Opinion 3 Opinion Formation Local Discussion Problem in a Group From the view of sociology, a “community” can be perceived as a group of persons who are connected to each other by relatively durable social relations to form a tight and cohesive social entity, due to the presence of a “unity of will” or “sharing common values”.

6 Yi Liu SJTU 2015/12/03 What is community? Useful Information: Content Position Graduate institutions Friends

7 Yi Liu SJTU 2015/12/03 Quiz 1 What is community From the view of sociology?

8 Yi Liu SJTU 2015/12/03 Graph Model A graph G=(V, E) consists of a set V of vertices, and a set E of edges. Each edge is a pair (v, u), where v, u V Undirected Graph Directed Graph Bob Jill Ted Ann V={Ann, Jill, Bob, Ted} E={(Ann, Bob), ( Bob, Jill), (Bod, Ted), (Ted, Jill)} Bob Jill Ted Ann V={Ann, Jill, Bob, Ted} E={(Ann, Bob), (Bob, Jill), (Ted, Bob), (Ted, Jill), (Ted, Ann), (Ann, Jill)}

9 Yi Liu SJTU 2015/12/03 Weighted Graph Sometimes edges have a third component, weight or cost, the semantics of which is specific to the graph. A graph that has values associated with its edges is called a weighted graph. The graph can be either directed or undirected. The weights can represent things like: 1. Physical distance between two vertices. 2. Time it takes to get from one vertex to another. 3. How much it costs to travel from vertex to vertex.

10 Yi Liu SJTU 2015/12/03 An Example of Weighted Undirected Graph Bob Jill Ann Weighted Graph Ted 20 2 1.2 60  Bob and Ann meet each other 2 times a year  Bob and Ann meet each other 20 times a year  Bob and Ted meet each other 60 times a year  Ted and Jill meet each other only 1.2 times a year

11 Yi Liu SJTU 2015/12/03 The degree of vertex v is the number of edges link to v, noted as TD(v) Graph Model Bob Jill Ann Ted 20 2 1.2 60 Subgraph Let G = (V, E) be a graph with vertex set V and edge set E. A subgraph of G is a graph G' = (V', E') where 1. V' is a subset of V. 2. E' consists of edges (v, w) in E such that both v and w are in V'.

12 Yi Liu SJTU 2015/12/03 Given an undirected graph G = (V,E,W), the neighborhood of a node u ∈ V is the set containing node u and its adjacent nodes. Definition 1 (Neighbors of node u) Definition 2 (Jaccard Distance) Given an undirected graph G = (V,E,W), the Jaccard distance of two nodes u and v is defined as: Bob Jill Ann Ted 20 2 1.2 60 Graph Model

13 Yi Liu SJTU 2015/12/03 A community is a subgraph containing nodes which are more densely linked to each other than to the rest of graph or equivalently. A graph has a community structure if the number of links into any subgraph is higher than the number of links between those subgraphs. Community In Graph Quiz 2 How many subgraphs are there?

14 Yi Liu SJTU 2015/12/03 Challenges Challenges: Large-scale network time constraints memory limitation H igh-quality communities user-defined criteria Parametrization The outliners sensitive to parameter(s) How can we i ntuitively detect natural communities with high quality in large networks?

15 Yi Liu SJTU 2015/12/03 Existing Solutions and Their Drawbacks Cut-Criteria Based Community Detection Ncut is a well-known algorithm for graph clustering by optimizing the normalized cut criterion. As the eigen-value decomposition is applied to speed up finding the optimal cut, it is also usually called as spectral clustering. consider the connection between groups relative to the density of each group:

16 Yi Liu SJTU 2015/12/03 Although this type of community detection usually allows identifying the communities with high quality, it is not capable of handling large-scale networks. In addition, it is a non-trivial task to determine the suitable number of communities without prior knowledge. Drawbacks of Ncut Existing Solutions and Their Drawbacks

17 Yi Liu SJTU 2015/12/03 Drawbacks Modularity-based community detection algorithms tend to fail on many real-world networks due to the “resolution limit”. The situation becomes worse especially when the network size increases. Existing Solutions and Their Drawbacks Modularity is the current most popular community detection algorithm based on the modulairty measure, which uses the expected cut to measure clustering quality.

18 Yi Liu SJTU 2015/12/03 Basic idea Dynamic point of view Consider a given network as a dynamic system, and each node interacts with its local neighbors. Distance Dynamics vs Node Dynamics Investigate dynamics of edges instead of dynamics of nodes.

19 Yi Liu SJTU 2015/12/03 Interaction model – three interaction patterns Assumption: If two nodes are linked, each node attracts the other and makes the opposite node move to itself. Direct interaction: makes u and v closer. Common neighbors: make u and v closer. Exclusive neighbors: make u and v closer or further.

20 Yi Liu SJTU 2015/12/03 Interaction model – three interaction patterns Pattern 1 Influence from direct linked nodes. Formally, to characterize the change of the distance d(u, v), we define DI, indicating the influence from the interactions of direct linked nodes, as follows: where deg(u) is the degree of the node u, f(·) is a coupling function and sin(·) is used in this study. 1−d(u, v) indicates the similarity between u and v

21 Yi Liu SJTU 2015/12/03 Interaction model – three interaction patterns Pattern 2 Influence from common neighbors We define the change of d(u, v) from the influence of common neighbors, CI, as follows: Here the two terms (1 − d(x, v)) and (1 − d(x, u)) for each common neighbor are used to further quantify the degree of influence compared to the influence from direct linked nodes.

22 Yi Liu SJTU 2015/12/03 Interaction model – three interaction patterns Pattern 3 Influence from exclusive neighbors

23 Yi Liu SJTU 2015/12/03 Interaction model – interaction pattern (1)

24 Yi Liu SJTU 2015/12/03 Attractor Algorithm Compute Jaccard distance for each edge Investigate each edge based on equation (1) Cut off the edges with distance of 1 InitializationDynamicsCommunities Detection (a) T=0(b) T=1(c) T=9

25 Yi Liu SJTU 2015/12/03 Parameterization: cohesion parameter ƛ Cohesion parameter ƛ determines the coarseness of communities. Large ƛ produces deliberate communities and small ƛ yields large communities. experiments demonstrate Attractor can achieve high-quality result when cohesion parameter ƛ is within [0.4 0.6]. 10

26 Yi Liu SJTU 2015/12/03 Comparison on Synthetic Data noisedensity Evaluation – comparison on synthetic network

27 Yi Liu SJTU 2015/12/03 Evaluation – comparison on real network Table. Statistics of real-world data sets, where AD: average degree; CC: clustering coefficient.

28 Yi Liu SJTU 2015/12/03 Evaluation – comparison on real network Labeled networks Unlabeled networks

29 Yi Liu SJTU 2015/12/03 Evaluation – comparison on real network Figure. Attractor on karate club network. Colors of nodes indicate different detected communities.

30 Yi Liu SJTU 2015/12/03 Evaluation – comparison on real network Figure. Attractor on American football network.

31 Yi Liu SJTU 2015/12/03 Evaluation - small communities and anonalies We can find that Attractor finds many small communities and the local noise level shows that Attractor could detect anomalies effectively.

32 Yi Liu SJTU 2015/12/03 Evaluation - time complexity Attractor has low time complexity — O(|E|) and it can handle large networks in real word.

33 Yi Liu SJTU 2015/12/03 Further Application Fail alarm Autistic patients Precision marketing

34 Yi Liu SJTU 2015/12/03 Conclusion Based on distance dynamics, Attarctor has several benefits: Intuitive community detection: In stead of optimising user- defined measures, Attractor investigates community structure from a new point of view — distance dynamics. Scalability: Attractor has low time complexity O(|E|) and is easy to speed up. Small community and anomaly detection: Attractor allows discovering arbitrary-size communities and anomalies that exist in the real network.

35 Thanks for your attention ! Q & A


Download ppt "Community Detection based on Distance Dynamics Reporter: Yi Liu Student ID: 015033910017 Department of Computer Science and Engineering Shanghai Jiao Tong."

Similar presentations


Ads by Google