Graph Analysis by Persistent Homology

Graph Analysis by Persistent Homology
Dingkang Wang

Backgroud Objectives Preprocessing Distance Metrics Simplices Extraction & Comparison Results

Background What is graph analysis? Commonly used methods:
Characterize structures of a large graph in terms of nodes and edges. Make it possible to compare two large graphs. Commonly used methods: Degree distribution Diameter, shortest path distance distribution Community structures In my project: Persistent Homology Data Analysis or Network Analysis is a big area of large research interest, previously, most of the network analysis is based on traditional metrics, e.g., the degree distribution, graph diameter and so on. However, these metrics are vulnerable to noise data points, and may not be able to reveal the hidden dierence. Topological analysis by persistence homology is a new excellent tool for network analysis, although most of the time, it's used for analyzing data embedded in Euclidean space, however, for node-link network, we can still make use of it after some distance calculation. This method is stable, which means a little perturbation won't make a signicant eect on the output diagram, in addition, it can nd out the topological features of a graph, which cannot be shown through other methods.

Objectives Part I Part II
Characterize the structure of graphs in different categories. Part II Find out the “interesting” year in senate voting graphs. stanford network analysis project Slashdot Social network Arxiv High Energy Physics paper citation network Gnutella peer to peer network from August

Preprocessing Denoising by Jaccard Index:
Landmark Sampling when input graph is too large: Pick the first landmark randomly Pick the next landmark, which is farthest from the chosen landmarks until you get enough landmarks Other nodes will be assigned to one of the landmarks based on the distance Two landmarks will have connection while their communities have connections In my project, I defined the Jaccard Index of an edge (u; v), or equivalently, for two nodes u; v with a connection as follows: where N(u) represents the neighbor nodes of node u. Now we can set a threshold, when the Jaccard Index of an edge is less than the threshold, we can consider it as a noise edge and remove that edge. Intuitively, that means the similarity between the two endpoints are too low, so there should not be an edge between these two nodes.

Distance Metrics Shortest Path Distance Diffusion Distance
Easier to calculate More sensitive Lack of variety Diffusion Distance Step 1 Step 2 Step 3 we define a pairwise similarity matrix between points, for example using a Gaussian kernel with width σ2 a diagonal normalization matrix, make it a transition probability matrix Calculate the distance with the specific choice w(y)=1/φ0(y) for the weight function, which takes into account the (empirical) local density of the points, and puts more weight on low density points. This distance is robust to noise, since the distance between two points depends on all possible paths of length between the points. Parameters are hard to choose, so we try different parameters and find a suitable one.

Simplices Extraction & Diagram Comparison
Using Rips Complex Using Phat to generate persistent diagrams Using two different metrics, bottleneck distance and Wasserstein distance In this step, we will only extract simplicial complexes up to 3-dimension, i.e, we only focus on nodes, edges, triangles and tetrahedrons, which is due to the limit of time complexity. Also, we need to include the birth date of these complexes, more specically, we use Rips complex. For Rips complex, we add a k-simplex when its boundary simplices are all in, e.g, there are 3 nodes, a, b, c, and their corresponding edges ab, bc, ac have weight 4; 5; 6, we add these edges at time 4; 5; and 6. Once you all all the edges, the triangle abc will be added, so triangle abc will be in at time 6. First the bottleneck distance used in my project can be dened as follows: In other words, it will nd out a bijection between two node set, mapping x in X to (x) in Y, and try to minimize the maximum of the pairwise distances. When number of two node For wasserstein distance, which is more stable than the bottleneck distance, can be dened as follows: We can see that the wasserstein distance uses the sum instead of the maximum of all pairwise distances, so in some sense, it will be less sensitive to the outliners.

Results

Thank you! Any questions?

Graph Analysis by Persistent Homology

Similar presentations

Presentation on theme: "Graph Analysis by Persistent Homology"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Graph Analysis by Persistent Homology

Similar presentations

Presentation on theme: "Graph Analysis by Persistent Homology"— Presentation transcript:

Similar presentations

About project

Feedback