Finding Top-k Shortest Path Distance Changes in an Evolutionary Network SSTD 2011 24 th August 2011 Manish Gupta UIUC Charu Aggarwal IBM Jiawei Han UIUC.

Finding Top-k Shortest Path Distance Changes in an Evolutionary Network SSTD 2011 24 th August 2011 Manish Gupta UIUC Charu Aggarwal IBM Jiawei Han UIUC

Networks as evolutionary graphs Social networks: new users join, new friendships are created. Bibliographic networks: new authors publish more papers, more collaborations are done. Transportation/road networks: new roads are constructed. Ad hoc networks: Army vehicles change positions very frequently, new messages transmitted.

Analysis of evolutionary networks Community formation, using clustering techniques Metrics to study evolution – merge/split Information diffusion across evolutionary networks Link prediction tasks Queries over evolving networks

Queries over Evolving networks Updating shortest path distance between two nodes as the edge weights change. E.g., in computer networks, routers need to update their shortest path trees when a link goes down. Given a time dependent network (edge weights are function of time), how to compute SPD(u, v, t). Queries incorporating the max flow constraints.

Transportation Planning Problem Given the current set of roads, we want to overlay a network of new roads. Civil engineers propose two plans: A and B with different sets of new roads Which plan is better? Plan A brings cities X and Y very close. X produces a lot of product P while Y has a rich demand for product P. Plan A actually brings lots of “economically important pairs” of cities close to each other. Select plan A over B.

Our problem Given an evolutionary network with two snapshots G 1 and G 2. Compute top few node pairs with maximum shortest path distance change across the two snapshots. For example, across 2005 and 2011, distance between which pair of cities in Illinois decreased the most, thanks to the new roads built in this time period?

Naïve Approach Compute shortest path distance between every pair of nodes for snapshot G 1. Compute shortest path distance between every pair of nodes for snapshot G 2. Compute distance change for every pair of nodes. Sort the distance change vector Return node pairs corresponding to the top few distance change values. Highly inefficient solution!

Solution We experiment on three datasets: DBLP co-authorship graph, IMDB co-starring graph and Ontario province road network. Throw in more CPUs! Shortest path algorithms are easily parallelizable. Run single source shortest path runs across thousands of machines. On the Ontario road network dataset, it took around 400 CPU days! OR Use our algorithm Our methods are ~50-100X faster than baseline

Outline Smartly choose a seed set of few source nodes to run single source shortest path algorithm from: Incidence Algorithm. Improve the accuracy of Incidence Algorithm by intelligently expanding the seed set using Edge importance estimation algorithm. Generalize the problem to a node ranking problem. Suggest node ranking strategies. Experimental results and analysis.

Incidence Algorithm Maximum distance change will happen for node pairs consisting of nodes on which new edges or edges with changed weights are incident. Let V’ be the set of nodes with new edges. Algorithm: Run single source SPD algorithm from each node in V’ on both snapshots, compute difference (change), sort and return top k.

Is Incidence Algorithm accurate?

How to expand the seed set (V’)? Consider the neighbors of all the nodes currently in V’ as potential candidates. Expand to a promising neighbor. In particular, expand to a neighbor node a, if the edge that connects a to the current set V’ has relatively high importance, relative to other edges incident on node a. V’ a a Terminate when top k node pairs don’t change.

Edge importance number Importance number of an edge is the probability that the edge will lie on a randomly chosen shortest path tree in the graph. How to compute edge importance number for edge e? First find all shortest path trees and then find how many of such trees contain edge e. Too expensive! As inefficient as the naïve solution itself! Hence we compute estimate edge importance number using a randomized algorithm.

Edge Importance Estimation Algorithm Randomly sample a few nodes from the graph. Using each of these nodes S as source, obtain a shortest path tree T using an SPD algorithm (e.g. Dijkstra). For each tree T, perform distance labeling. Alternative Tight edge: An alternative edge which could replace an existent edge from T to give T’. For each edge in T, obtain multiple T’ by replacing a tight edge using an alternative tight edge. Edge importance of an edge wrt T is proportional to the number of descendants. Aggregate I(edge) across all different SPTs.

Generalizing the problem Naïve solution: Use all nodes in both snapshots. Incidence algorithm: Use only nodes in V’. Generalized solution? Node ranking problem. Rank nodes such that running Dijkstra algorithm from just top few nodes provides high accuracy for “topK node pairs with max distance change problem”.

How to rank nodes? Random: Randomly select nodes from the graph. RandomNWNE: Randomly select nodes from seed set V’ (nodes with new edges). Edge Weight Based Ranking (EWBR). Edge Weight Change Based Ranking (EWCBR). 0.1 0.2 0.3 0.1 0.2 0.1 0.2 0.3 0.01 0.02 0.1 0.15

How to rank nodes? Importance Number Based Ranking (INBR) Importance Number Change Based Ranking (INCBR) Ranking Using Edge Weight and Importance Numbers (RUEWIN) 0.1 0.2 0.3 0.1 0.2 0.1 0.2 0.3 0.5 0.02 0.1 0.75

How to rank nodes? Clustering Based Ranking (CBR) Clustering Based Ranking with Partitions (CBRP) Inter-cluster edges are more important than intra-cluster edges.

Clustering Based Ranking

Experiments

Related work Shortest path algorithms: Dijkstra [11], Shimbel [20], Johnson [15], Floyd, Warshall [14,21] Router networks [8,22] Outlier detection [5,13,18] Time dependent shortest paths [25,26] Dynamic shortest paths computation [3,4,6,19] Between-ness measures [23,24]

References

Thanks!

Finding Top-k Shortest Path Distance Changes in an Evolutionary Network SSTD 2011 24 th August 2011 Manish Gupta UIUC Charu Aggarwal IBM Jiawei Han UIUC.

Similar presentations

Presentation on theme: "Finding Top-k Shortest Path Distance Changes in an Evolutionary Network SSTD 2011 24 th August 2011 Manish Gupta UIUC Charu Aggarwal IBM Jiawei Han UIUC."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Finding Top-k Shortest Path Distance Changes in an Evolutionary Network SSTD 2011 24 th August 2011 Manish Gupta UIUC Charu Aggarwal IBM Jiawei Han UIUC.

Similar presentations

Presentation on theme: "Finding Top-k Shortest Path Distance Changes in an Evolutionary Network SSTD 2011 24 th August 2011 Manish Gupta UIUC Charu Aggarwal IBM Jiawei Han UIUC."— Presentation transcript:

Similar presentations

About project

Feedback