Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fundamental Data Structures and Algorithms

Similar presentations


Presentation on theme: "Fundamental Data Structures and Algorithms"— Presentation transcript:

1 15-211 Fundamental Data Structures and Algorithms
Shortest Paths Fundamental Data Structures and Algorithms Ananda Guna April 11, 2006

2 In this lecture.. recap of union/find algorithm
Unweighted and weighted graphs Graphs with no edge costs Simple BFS algorithm Graphs with non negative cost edges Dijkstra’s Algorithm Shortest Path in a DAG Next: Graphs with negative cost edges Bellman-Ford Algorithm

3 Understanding Union-Find

4 Forest and trees Each set is a tree {1}{2}{0,3} {4}{5}
union(1,2) adds a new subtree to a root {1,2}{0,3}{4}{5} union(0,1) adds a new subtree to a root {1,2,0,3}{4}{5} 1 2 3 4 5 1 3 4 2 5 1 3 4 2 5

5 Forest and trees - Array Representation
{1,2,0,3}{4}{5} find(2) = 1 find(4) = 4 Array representation 1 4 5 2 3

6 Find Operation {1,2,0,3}{4}{5} find(0) = 1 3 -1 1 1 -1 -1
public int find(int x) { if (s[x] < 0) return x; return find(s[x]); } 1 4 5 2 3

7 Union Operation {1,2}{0,3}{4}{5} {1,2,0,3}{4}{5} union(0,2)
4 2 5 {1,2}{0,3}{4}{5} {1,2,0,3}{4}{5} union(0,2) before after public void union(int x, int y){ S[find(x)] = find(y) }

8 The problem Find must walk the path to the root
Unlucky combinations of unions can result in long paths 1 2 3 4 5 6

9 Path compression for find
find flattens trees Redirect nodes to point directly to the root Do this while traversing path from node to root. 1 3 4 2 5 1 4 5 2 3

10 Path compression find flattens trees
Redirect nodes to point directly to the root Do this while traversing path from node to root. public int find(int x) { if (s[x]< 0) return x; return s[x] = find(s[x]); }

11 Union by size 1 3 2 Union-by-size 4 Representational trick Performance
Join lesser size to greater Label with sum of sizes Find (with/without path comp.): No effect Representational trick Positive numbers: index of parent Negative numbers: root, with size -s[x] Performance When depth of a tree increases on union, it is always at least twice previous size. Hence maximum of log(N) steps that increase depth. Hence maximum time for find is O(log(N)). 4 1 3 2

12 union by height union shallow trees into deep trees
Tree depth increases only when depths equal Track path length to root Tree depth at most O(log N) 3 1 1 3 4 2 5 1

13 Union by height, details
Different heights Join lesser height to greater Do not change height values Equal heights Join either tree to the other Add one to height of result Find: Without path compression No effect With path compression Must recalculate height Can involve looking at many subtrees 1 3 2 2

14 Union by rank Path compression is easy to implement when we use union-by-size. However, union-by-height is problematic with path compression Definition Rank of a node is initialized to 0 Updated only during union operation Union-by-rank Union: Different ranks Join lesser rank to greater Do not change rank value Equal ranks Join either to the other Add one to rank of result Find, with path compression Yields good performance

15 All the code class UnionFind { int[] u; UnionFind(int n) {
u = new int[n]; for (int i = 0; i < n; i++) u[i] = -1; } int find(int i) { int j,root; for (j = i; u[j] >= 0; j = u[j]) ; root = j; while (u[i] >= 0) { j = u[i]; u[i] = root; i = j; } return root; void union(int i,int j) { i = find(i); j = find(j); if (i !=j) { if (u[i] < u[j]) { u[i] += u[j]; u[j] = i; } else { u[j] += u[i]; u[i] = j; }

16 The UnionFind class class UnionFind { int[] u; UnionFind(int n) {
u = new int[n]; for (int i = 0; i < n; i++) u[i] = -1; } int find(int i) { ... } void union(int i,int j) { ... }

17 Iterative find int find(int i) { int j, root;
for (j = i; u[j] >= 0; j = u[j]) ; root = j; while (u[i] >= 0) { j = u[i]; u[i] = root; i = j; } return root; }

18 union by size i = find(i); j = find(j); if (i != j) {
void union(int i,int j) { i = find(i); j = find(j); if (i != j) { if (u[i] < u[j]) { u[i] += u[j]; u[j] = i; } else { u[j] += u[i]; u[i] = j; } }

19 Analysis of UnionFind

20 Analysis of Union-Find
The algorithm Union: by rank Find: with path compression 3 1 1 2 2 3 4 5 1 6

21 Analysis - Rank tree size
Lemma. After a sequence of union instructions, a node of rank r will have at least 2r descendents, including itself. Proof. r = = 1. r > 0. Let T be the smallest rank-r tree and X be its root. Suppose T was result of union(T1, T2) and X was root of T1. The ranks of T1 and T2 must both be r-1. If one of rank of Ti were r then T could not be smallest rank-r tree.Also, since the union increased rank, the Ti ranks must be equal. By induction hypothesis, each Ti has at least 2r-1 descendents. Total must therefore be at least 2r. Note on path compression Path compression doesn’t affect rank Though it does affect height!

22 Analysis - Nodes of rank r
Lemma. The number of nodes of rank r is at most N/2r. Proof. Each node of rank r roots a subtree of at least 2r nodes. No node within the subtree can be of rank r. So all subtrees of rank r are disjoint. At most N/2r subtrees. Examples: rank 0: at most N subtrees (i.e., every node is a root). rank log(N): at most 1 subtree (of size N).

23 Analysis - Ranks on a path
Lemma. Node rank always increases from leaf to root. Proof. Obvious if no path compression. With path compression, nodes are promoted from lower levels and hence were of lesser rank.

24 Time bounds Variables M operations. N elements. Algorithms
Simple forest representation Worst: find O(N). mixed operations O(MN). Average: tricky Union by height; Union by size Worst: find O(log N). mixed operations O(M log N). Average: mixed operations O(M) Path compression in find Worst: mixed operations: “nearly linear” [analysis in ]

25 Maze Generator figure 24.2 Initial state: All walls are up, and all cells are in their own sets.

26 Shortest Paths

27 Airline routes BOS ORD PVD SFO JFK BWI LAX DFW MIA 2704 867 1846 187
849 PVD SFO 740 JFK 144 802 1464 337 621 1258 184 BWI 1391 LAX DFW 1090 1235 946 1121 MIA 2342

28 Single-source shortest path
Suppose we live in Baltimore (BWI) and want the shortest path to San Francisco (SFO). Naïve Approach A Better way to solve this is to solve the single-source shortest path problem: That is, find the shortest path from BWI to every city.

29 Why Need to Find ALL Shortest Paths?
While we may be interested only in BWI-to-SFO, there are no known algorithms that are asymptotically faster than solving the single-source problem for BWI-to-every-city.

30 Shortest paths What do we mean by “shortest path”?
Minimize the number of layovers (i.e., fewest hops). Unweighted shortest-path problem. Minimize the total mileage (i.e., fewest frequent-flyer miles ;-). Weighted shortest-path problem.

31 Many applications Shortest paths model many useful real-world problems. Minimization of latency in the Internet. Minimization of cost in power delivery. Job and resource scheduling. Route planning. MapQuest, Google Maps

32 Unweighted Single-Source Shortest Path Algorithm

33 Unweighted shortest path
In order to find the unweighted shortest path, we will mark vertices and edges so that: vertices can be marked with an integer, giving the number of hops from the source node, and edges can be marked as either explored or unexplored. Initially, all edges are unexplored.

34 Unweighted shortest path
Algorithm: Set i to 0 and mark source node v with 0. Put source node v into a queue L0. While Li is not empty: Create new empty queue Li+1 For each w in Li do: For each unexplored edge (w,x) do: mark (w,x) as explored if x not marked, mark with i+1 and enqueue x into Li+1 Increment i.

35 Breadth-first search This algorithm is a form of breadth- first search. Performance: O(|V|+|E|). Why? Q: Use this algorithm to find the shortest route (in terms of number of hops) from BWI to SFO.

36 Use of a queue It is very common to use a queue to keep track of:
nodes to be visited next, or nodes that we have already visited. Typically, use of a queue leads to a breadth-first visit order. Breadth-first visit order is “cautious” in the sense that it examines every path of length i before going on to paths of length i+1.

37 Greedy Algorithms

38 Greedy Algorithms In a greedy algorithm, during each phase, a decision is made that appears to be optimal, without regard for future consequences. This “take what you can get now” strategy is the source of the name for this class of algorithms. When a problem can be solved with a greedy algorithm, we are usually quite happy Greedy algorithms often match our intuition and make for relatively painless coding.

39 Greedy Algorithms 4 ingredients needed Optimization problem
Maximization or minimization Can only proceed in stages No direct solution available Greedy Choice Property A locally optimal solution (greedy) will lead to a globally optimal solution Optimal Substructure An optimal solution to the problem contains, within it the optimal solution to the sub problem

40 Greedy Algorithms Minimize number of coins Find Huffman Code
Prim’s and Kruskal’s Dijkstra’s algorithm for shortest path

41 Weighted Single-Source Shortest Path Algorithm (Dijkstra’s Algorithm)

42 Weighted shortest path
Now suppose we want to minimize the total mileage. Breadth-first search does not work! Minimum number of hops does not mean minimum distance. Consider, for example, BWI-to-DFW:

43 Three 2-hop routes to DFW
2704 BOS 867 1846 187 ORD 849 PVD SFO 740 JFK 144 802 1464 337 621 1258 184 BWI 1391 LAX DFW 1090 1235 946 1121 MIA 2342

44 Intuition behind Dijkstra’s alg.
For our airline-mileage problem, we can start by guessing that every city is  miles away. Mark each city with this guess. Find all cities one hop away from BWI, and check whether the mileage is less than what is currently marked for that city. If so, then revise the guess. Continue for 2 hops, 3 hops, etc.

45 Dijkstra’s: Greedy algorithm
Assume that every city is infinitely far away. I.e., every city is  miles away from BWI (except BWI, which is 0 miles away). Now perform something similar to breadth-first search, and optimistically guess that we have found the best path to each city as we encounter it. If we later discover we are wrong and find a better path to a particular city, then update the distance to that city.

46 Dijkstra’s algorithm Algorithm initialization:
Label each node with the distance , except start node, which is labeled with distance 0. D[v] is the distance label for v. Put all nodes into a priority queue Q, using the distances as labels.

47 Dijkstra’s algorithm, cont’d
While Q is not empty do: u = Q.removeMin for each node z one hop away from u do: if D[u] + miles(u,z) < D[z] then D[z] = D[u] + miles(u,z) change key of z in Q to D[z] Note use of priority queue(Heap) allows “finished” nodes to be found quickly (in O(log |V|) time).

48 Shortest mileage from BWI
2704 BOS 867 1846 187 ORD 849 PVD SFO 740 JFK 144 802 1464 337 621 1258 184 BWI 1391 LAX DFW 1090 1235 946 1121 MIA 2342

49 Shortest mileage from BWI
2704 BOS 867 1846 187 ORD 621 849 PVD SFO 740 JFK 184 144 802 1464 337 621 1258 184 BWI 1391 LAX DFW 1090 1235 946 1121 MIA 946 2342

50 Shortest mileage from BWI
2704 BOS 371 867 1846 187 ORD 621 849 PVD 328 SFO 740 JFK 184 144 802 1464 337 621 1258 184 BWI 1391 LAX DFW 1575 1090 1235 946 1121 MIA 946 2342

51 Shortest mileage from BWI
2704 BOS 371 867 1846 187 ORD 621 849 PVD 328 SFO 740 JFK 184 144 802 1464 337 621 1258 184 BWI 1391 LAX DFW 1575 1090 1235 946 1121 MIA 946 2342

52 Shortest mileage from BWI
2704 BOS 371 867 1846 187 ORD 621 849 PVD 328 SFO 3075 740 JFK 184 144 802 1464 337 621 1258 184 BWI 1391 LAX DFW 1575 1090 1235 946 1121 MIA 946 2342

53 Shortest mileage from BWI
2704 BOS 371 867 1846 187 ORD 621 849 PVD 328 SFO 2467 740 JFK 184 144 802 1464 337 621 1258 184 BWI 1391 LAX DFW 1423 1090 1235 946 1121 MIA 946 2342

54 Shortest mileage from BWI
2704 BOS 371 867 1846 187 ORD 621 849 PVD 328 SFO 2467 740 JFK 184 144 802 1464 337 621 1258 184 BWI 1391 LAX 3288 DFW 1423 1090 1235 946 1121 MIA 946 2342

55 Shortest mileage from BWI
2704 BOS 371 867 1846 187 ORD 621 849 PVD 328 SFO 2467 740 JFK 184 144 802 1464 337 621 1258 184 BWI 1391 LAX 2658 DFW 1423 1090 1235 946 1121 MIA 946 2342

56 Shortest mileage from BWI
2704 BOS 371 867 1846 187 ORD 621 849 PVD 328 SFO 2467 740 JFK 184 144 802 1464 337 621 1258 184 BWI 1391 LAX 2658 DFW 1423 1090 1235 946 1121 MIA 946 2342

57 Shortest mileage from BWI
2704 BOS 371 867 1846 187 ORD 621 849 PVD 328 SFO 2467 740 JFK 184 144 802 1464 337 621 1258 184 BWI 1391 LAX 2658 DFW 1423 1090 1235 946 1121 MIA 946 2342

58 Shortest mileage from BWI
2704 BOS 371 867 1846 187 ORD 621 849 PVD 328 SFO 2467 740 JFK 184 144 802 1464 337 621 1258 184 BWI 1391 LAX 2658 DFW 1423 1090 1235 946 1121 MIA 946 2342

59 Find the Shortest Path from S
b d e c g 4 2 5 1

60 Dijkstra’s Algorithm is greedy
Optimization problem Of the many feasible solutions, finds the minimum or maximum solution. Can only proceed in stages no direct solution available Greedy-choice property: A locally optimal (greedy) choice will lead to a globally optimal solution. Optimal substructure: An optimal solution contains within it optimal solutions to subproblems

61 Features of Dijkstra’s Algorithm
“Visits” every vertex only once, when it becomes the vertex with minimal distance amongst those still in the priority queue Distances may be revised multiple times: current values represent ‘best guess’ based on our observations so far Once a vertex is visited we are guaranteed to have found the shortest path to that vertex…. why?

62 Correctness (by contradiction)
Prove by induction on stage k of the algorithm Assume v is the vertex visited at k+1 stage. Assume that dist(v) is not a shortest path. Thus the true shortest path must pass through a fringe vertex x. v x s visited fringe unreached By the inductive hypothesis, dist(x) must represent a shortest path to x, and so dist(x)  distshortest(v) < dist(v). But Dijkstra’s always visits the vertex with the smallest distance next, so we can’t possibly visit v before we visit x. A contradiction.

63 Performance (using a heap)
Initialization: O(n) Visitation loop: n calls deleteMin(): O(log n) Each edge is considered only once during entire execution, for a total of e updates of the priority queue, each O(log n) Overall cost: O( (n+e) log n )

64 Dijkstra’s summary Dijkstra’s algorithm is greedy
Dijkstra’s find shortest paths to all nodes from the origin even if we are interested only in the shortest path to a single node Dijkstra’s only finds the length of the shortest path It is possible to modify the Dijkstra’s to actually find out the nodes in the shortest path Dijkstra’s algorithm assumes that all distances are non-negative

65 Shortest Path in a DAG

66 Shortest Paths in a DAG How do we detect a graph is a DAG?
Complexity? If we know the graph is a DAG, can we do better than Dijkstra?

67 The Idea Order the nodes in topological order
Arrows can only point left to right Relax the edges in forward order Never have to worry about ancestors

68 Iteration 1

69 Iteration 2


Download ppt "Fundamental Data Structures and Algorithms"

Similar presentations


Ads by Google