Download presentation
Presentation is loading. Please wait.
Published byBarrie Stephen Foster Modified over 8 years ago
1
Topics In Social Computing (67810) Module 1 (Structure) Centrality Measures, Graph Clustering Random Walks on Graphs
2
CENTRALITY MEASURES 2
3
Centrality Measures of centrality attempt to quantify the importance of a node in the social network. How much “social capital” does it posses? Many measures of centrality. – Emphasize different notions of importance Some that we have already mentioned: Short distances, high degree, bridging etc. 3
4
Closeness Centrality 4
5
Harmonic Centrality 5
6
Betweenness Centrality Reminder: An edge “helps” the flow of information if it is on the short path between two vertices Nodes that are on the short path between many vertices contribute more The idea: count the number of short paths an edge/vertex are on – How do we handle multiple short paths between pairs? 6
7
Betweenness of a vertex 7
8
Example 8 71 5 3 46 2
9
Computing Betweenness “efficiently” 9
10
Eigenvector Centrality / PageRank Another measure of the importance of a node. Famously used by Google to assist in ranking of pages on the internet. – Search provides “relevant” results – Order is important (affected by “relevance” but also by PageRank measure) 10
11
Content Relevance is measured by appearance of keywords and other page statistics but may be misleading. Should the webpage that contains “coffee coffee coffee …” appear at the top of the search for the word coffee? PageRank looks at the structure of the network graph to try and determine the importance of a page A recursive definition: A page is considered more important if it is linked to by important pages. 11
12
PageRank Let us try to develop the intuitive notion of importance. Running example: 12 c e b ad
13
Importance Let us assume the “importance” of a node is simply a number. Initially assign importance arbitrarily: (in this case let’s try a uniform assignment of 1) 13 1 1 1 11
14
Updating once Now let’s update according to the rules: A node sends an equal share of its “importance” out through each outgoing edge A node’s importance in the next step will be the sum of “importance” it is awarded from each incoming edge 14 1 1 1 11 1 0.33 1.33 0.831.5 1/3 1/2 1/3 1 1 1
15
15 1 0.33 1.33 0.831.5 1.33 0.5 1.33 10.83
16
16 1.33 0.27 1.27 0.941.16 1.33 0.5 1.33 10.83
17
17 1.33 0.33 1.33 11 After many steps:
18
Things to Notice 1.We have arrived at a steady state. 2.“importance” is conserved throughout the process. (The sum over the whole graph is the same) – Therefore: From now on we will normalize the sum to be 1: 18 1.33 0.33 1.33 11 0.27 0.07 0.27 0.2
19
The Relation To Random Walks 19
20
A Simpler Linear Algebra Notation 20
21
A Simpler Linear Algebra Notation 21 c e b ad
22
A Simpler Linear Algebra Notation 22 c e b ad
23
A Simpler Linear Algebra Notation 23
24
24
25
25
26
Solutions 26
27
Problem 1 How do we handle nodes without any out-going edges? – Consider them as having a self loop? – Consider them as linking to all other nodes? PageRank simply removes all such nodes (dangling links) iteratively until the graph is without them. – They are added back after the PageRank calculation is done for the rest of the graph and their PageRank can be computed then. 27
28
Potential Problem 2 Convergence is not guaranteed for every graph 28 1/3 1/6 2/3 1/4 1/2 But as we’ve seen every Graph has some steady state:
29
Potential Problem 3 Sinks drain all the value from the rest of the graph And also may allow for several steady states 29 0 0 1/3 0 0 1/6
30
Solution: 30
31
The PageRank Algorithm: 31
32
Precise conditions for convergence 32
33
33
34
34
35
GRAPH CLUSTERING & COMMUNITY DETECTION 35
36
Community Detection Tightly knit communities will often have – Many social connections inside the community, – Fewer connections outside the community We’d like to detect them automatically from the structure. Many ways to define the notion of clusters/communities and many algorithms that detect them 36
37
Leftwing and Rightwing Political Blogs 37 Lada Adamic and Natalie Glance. The political blogosphere and the 2004 U.S. election: Divided they blog. In Proceedings of the 3rd International Workshop on Link Discovery. 2005.
38
https://medium.com/i-data/israel-gaza-war-data- a54969aeb23e?_ga=1.225686554.1873778177.137124 4953 https://medium.com/i-data/israel-gaza-war-data- a54969aeb23e?_ga=1.225686554.1873778177.137124 4953 38
39
The Girvan-Newman Alg. The main idea: edges connecting clusters have a high betweenness value. (The betweenness of an edge is defined similarly to that of a vertex) Iteratively remove the edges with highest betweenness in the graph. When graph breaks up to disconnected components, each component is a cluster. Clusters keep breaking up into sub-clusters. 39
40
Example 40
41
Modularity 41
42
Markov Clustering Intuition 1: When a random walk starts at some node, it is more likely to remain inside the node’s cluster at first. There are many edges inside the cluster and few that lead outside of it. 42
43
Markov Clustering We will use a short random walk to see where a random walker starting from each vertex is likely to go. There are no sinks (we assume graph is undirected) There may still be convergence issues if GCD of all cycle lengths >1. To fix this, we can add self loops to every node (this is an optional step – not needed in all graphs). 43
44
44 Most of the probability mass is still in the cluster 1 1/4 1/3 1/6 1/12 1/6
45
45 Most of the probability mass is still in the cluster 1 1/3 8/36 11/36 7/363/36 7/36
46
Picking a Representative Intuition 2: If each vertex choses another vertex as a representative, one that is possibly likely to be “more central” in his cluster, we will naturally have a clustering. 46
47
Picking a Representative But which representative to pick? Give weight to vertices we are highly likely to visit in the first stages of the random walk 47 8/36 11/36 7/363/36 7/36 The random walk started here
48
Picking a Representative Picking the maximal probability and rewiring the graph in this case will imply the vertex picks itself. 48 8/36 11/36 7/363/36 7/36 The random walk started here
49
Instead of a Deterministic Choice A deterministic choice is equivalent to turning the highest value to 1, and all others to 0. Softer approach: enhance the differences between values (rich get richer) 49 …4321Node 0Pr (arriving at node in random walk) 0Weight Normalized Factor 2 Becomes a factor of 4 The larger the exponent, the closer this is to picking a representative
50
Instead of a Deterministic Choice This is like rewiring the graph, but with stronger edge weights to vertices with higher probability of visitation. 50 8/36 11/36 7/363/36 7/36 origin Values need to be normalized to represent a stochastic transition matrix
51
Rewiring the Graph Do this for random walks that start at each of the vertices – to determine the outgoing edges (and weights) each one. We get a graph with edge-weights that is slightly closer to clustered 51
52
Intuition 3: A random walk on a graph in which everyone picked a representative, leads to picking the more central representatives 52 0 0 10 origin After 2 steps of a random walk: The rewired graph becomes:
53
The MCL Algorithm 53
54
Example In Matrix Form 54 4 2 1 7 6 5 3
55
Example (cont.) The transition matrix 55 4 2 1 7 6 5 3
56
Example (cont.): Expansion 56 4 2 1 7 6 5 3
57
Example (cont.): After Inflation 57 4 2 1 7 6 5 3
58
Example (cont.): After a few more iterations 58 4 2 1 7 6 5 3
59
Example (cont.): Eventually… 59 4 2 1 7 6 5 3 Second ClusterFirst Cluster
60
Implementation Issues 60
61
Sparsification 61
62
MCL May Converge to a Fuzzy Assignment Vertices may sometime belong equally to more than one cluster. This is a feature of the alg. not a bug! Example: 62
63
Other General Methods of Clustering Bottom up Hierarchical clustering: – Join nodes together according to a similarity score – The tree of unifications defines clustering at different level of granularity 63
64
Other General Methods of Clustering Top down Hierarchical clustering: – Slowly break up graph to smaller and smaller clusters. (e.g., using the Girvan-Newman method) – Also get clusters at different granularities. 64
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.