MST, Topological Sort and Disjoint Sets

Name: MST, Topological Sort and Disjoint Sets
Uploaded: 2017-10-22T08:18:18+00:00
Duration: PTM29S22
Channel: Katrina Woods
Description: MST, Topological Sort and Disjoint Sets

MST, Topological Sort and Disjoint Sets
Fundamental Data Structures and Algorithms Ananda Guna April 6, 2006

In this lecture Prim’s revisited Kruskal’s algorithm
Topological Sorting Union Find algorithms Disjoint Sets Analysis

Prim’s Algorithm Algorithm is based on the idea of two sets
S = vertices in the current MST V-S = vertices not in the current MST Find the minimum edge (u,v) such that u is in S and v is in V-S Add the edge to MST and node v to S At the end algorithm guarantees that we have constructed a MST Note: MST is not unique

Prim’s Algorithm Invariant
At each step, we add the edge (u,v) s.t. the weight of (u,v) is minimum among all edges where u is in the tree and v is not in the tree Each step maintains a minimum spanning tree of the vertices that have been included thus far When all vertices have been included, we have a MST for the graph!

Running time of Prim’s algorithm
Initialization of priority queue (array): O(|V|) Update loop: |V| calls Choosing vertex with minimum cost edge: O(|V|) with heaps O(log (|V|)) Updating distance values of unconnected vertices: each edge is considered only once during entire execution, for a total of O(|E|) updates Overall cost without heaps: What is the run time complexity if heaps are used? O(|E| + |V|2))

Correctness Lemma: Let G be a connected weighted graph and let G’ be a subgraph of G that is contained in a MST T. Let C be a component of G’. Let S be the set of all edges with one vertex in C and other not in C. If we add a minimum edge weight in S to G’, then the resulting graph is contained in a minimal spanning tree of G

Correctness Theorem: Prim’s algorithm correctly finds a minimal spanning tree Proof: by induction show that tree constructed at each iteration is contained in a MST. Then at the termination, the tree constructed is a MST Base case: tree has no edges, and therefore contained in every spanning tree Inductive case: Let T be the current tree constructed using Prim’s algorithm. By inductive argument, T is contained in some MST. Let (u,v) be the next edge selected by Prim’s, such that u in T and v not in T. Let G’ be T together with all vertices not in T. Then T is a component of G’ and (u,v) is a minimum weight edge with one vertex in T and one not in T. Then by lemma, when (u,v) is added to G’ , the resulting graph is also contained in a MST.

Kruskal’s Algorithm

forest: {a}, {b}, {c}, {d}, {e}
Another Approach Create a forest of trees from the vertices Repeatedly merge trees by adding “safe edges” until only one tree remains A “safe edge” is an edge of minimum weight which does not create a cycle a c e d b 2 4 5 9 6 forest: {a}, {b}, {c}, {d}, {e}

Kruskal’s algorithm Initialization
a. Create a set for each vertex v  V b. Initialize the set of “safe edges” A comprising the MST to the empty set c. Sort edges by increasing weight a c e d b 2 4 5 9 6 F = {a}, {b}, {c}, {d}, {e} A =  E = {(a,d), (c,d), (d,e), (a,c), (b,e), (c,e), (b,d), (a,b)}

Kruskal’s algorithm For each edge (u,v)  E in increasing order while more than one set remains: If u and v, belong to different sets U and V a. add edge (u,v) to the safe edge set A = A  {(u,v)} b. merge the sets U and V F = F - U - V + (U  V) Return A

Kruskal’s algorithm b a (b,e), (c,e), (b,d), (a,b)} d e c Forest
9 b a 2 6 E = {(a,d), (c,d), (d,e), (a,c), (b,e), (c,e), (b,d), (a,b)} d 4 5 5 4 5 e c Forest {a}, {b}, {c}, {d}, {e} {a,d}, {b}, {c}, {e} {a,d,c}, {b}, {e} {a,d,c,e}, {b} {a,d,c,e,b} A  {(a,d)} {(a,d), (c,d)} {(a,d), (c,d), (d,e)} {(a,d), (c,d), (d,e), (b,e)}

Kruskal’s Algorithm Summary
After each iteration, every tree in the forest is a MST of the vertices it connects Algorithm terminates when all vertices are connected into one tree Both Prim’s and Kruskal’s algorithms are greedy algorithms Complexity of Kruskal’s algorithm O(|E| log |E|) to sort the edges O(|V|) initial sets O(|V||log|V|) find and union operations What if the edges are maintained in a PQ?

Topological Sort

Topological sort Definition:
A topological sort of G=(V,E) is an ordering of all of G’s vertices v1, v2, …, vn such that for every edge (vi,vj) in E, i<j.

In a topological ordering no arrow can point backward
Pour foundation Building permit Framing Plumbing Electrical wiring Paint interior Paint exterior Building permit Pour foundation Framing Electrical wiring Plumbing Paint exterior Paint interior In a topological ordering no arrow can point backward

Finding a topological sort
Place the vertices in order from left to right No edge arrow can point backward If an order can be found, we can do tasks from left to right Questions Does a graph always need to have a topological sort? If so, can there be more than one topological sort for a given graph What if the graph has a cycle? Is it possible to have a topological sort?

How to find a topological sort
If the graph has no vertex with indegree 0, can we find a topological sort? If the graph has a vertex with in-deg 0, then start with that vertex Delete the vertex and put that at the front of the sorted list repeat

Implementing topological sort
We need to maintain indegrees Maintain an array of indegrees Indeg[i] is the indegree of the vertex i As you delete vertices, reduce the indegree of all the vertices it is pointing to. When a vertex gets indegree 0, put that into a list of nodes to be deleted

Example 1 4 7 6 3 5 2 indegree 1 2 3 4 5 6 7

Topological sort Nodes in a dag can be ordered linearly.
Topological orders: 1,2,5,4,3,6,7 2,1,5,4,7,3,6 2,5,1,4,7,3,6 Etc. For our building example, any topological order is a feasible schedule. 1 4 7 6 3 5 2

Homework Build all topological orders of the following graph
Building permit Pour foundation Framing Plumbing Electrical wiring Paint interior Paint exterior

Topological sort algorithm
Suppose in degree is stored with each node. Q: What is the cost of storing in-degree (assume adjacency list implementation) After the graph is built? (cost?) While building the graph? (cost?) Scan all nodes, pushing roots onto a stack. (cost?) Repeat until stack is empty: (cost?) Pop a root r from the stack and output it. (cost?) For all nodes n (non-roots) such that (r,n) is an edge, decrement n’s in degree. If 0 then push onto the stack. (cost?) O(|V|+|E|), but better in practice. Q: How can we tell if a graph has a cycle?

Union Find

Equivalence relations
The relation “~” is an equivalence relation if (for all a, b, and c) a ~ a reflexive a ~ b iff b ~ a symmetric a ~ b & b~ c  a ~ c transitive Examples “<” transitive, not reflexive, not symmetric “<=” transitive, reflexive, not symmetric “e1 = O(e2)” transitive, not reflexive, not symmetric “==” transitive, reflexive, symmetric “connected” transitive, reflexive, symmetric

Let U = {1,2,3,4,5,6,7,8,9} and 1~5, 6~8, 7~2, 9~8, 3~7, 4~2, 9~3 U contains two equivalence classes w.r.t. “~”: {2,3,4,6,7,8,9} and {1,5} 3~5 iff 3 and 5 belong to the same equivalence class. Let ~ be an equivalence relation “~” over a set U. Each member a of U has an equivalence class with respect to “~”: [a] = {b | a ~ b}

The set of equivalence classes are a partition of U. { {2,3,4,6,7,8,9}, {1,5} } In general i  j implies Pi  Pj={}. For each a  U, there is exactly one i such that a  Pi . Why study Equivalence Relations? What problems can be solved by understanding equivalence relations? Common ancestor problem Maze problem

Applications

Maze Generator How can we generate maze like this? figure 24.1
A 50 x 88 maze How can we generate maze like this?

An Application - The Maze problem
A maze is a grid of rooms separated by walls. Each room has a name. Think of maze as a graph: Nodes x, y, z represent rooms Edges (x,y) indicate that Rooms x and y are adjacent, and There is no wall between them. a b c d h g f e i j k l p o n m Randomly knock out walls until we get a good maze.

Mathematical formulation
A set of rooms: {a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p} Identify pairs of adjacent rooms that have an open wall between them. E.g., (a,b) and (g,k) are pairs.

Mazes as graphs a b c d e f g h i j k l m n o p {(a,b), (b,c), (a,e), (e,i), (i,j), (f,j), (f,g), (g,h), (d,h), (g,k), (m,n), (n,o), (k,o), (o,p), (l,p)}

Unique solutions What property must the graph have for the maze to have a solution? A path from (a) to (p). What property must it have for the maze to have a unique solution? The graph must be a tree. a b c d e f g h i j k l m n o p

Mazes as trees Informally, a tree is a graph where:
Each node has a unique parent. Except a unique root node that has no parent A spanning tree is a tree that includes all of the nodes. Why is it good to have a spanning tree? e b i c j Trees have no cycles! f g h k d o n p m l

Dynamic equivalence How can we check for dynamic equivalence
Do two elements belong to the same equivalence class? Is there a path from one node to another? Union-Find Abstraction find(i) returns the name of the set containing i. union(i,j) joins the sets containing i and j. Effects Calls to union can change future find results Calls to find do not change future find results.

The Union-Find Interface
Represent elements as ints Let 0  i  N stand for ei  {e0,… ,eN-1} Find identifies the set containing i. int find(int i) Equivalence testing find(i) == find(j) Union called for an effect void union(int i, int j) Affects results of future calls to find

Understanding Union-Find

Forest and trees Each set is a tree {1}{2}{0,3} {4}{5}
union(1,2) adds a new subtree to a root {1,2}{0,3}{4}{5} union(0,1) adds a new subtree to a root {1,2,0,3}{4}{5} 1 2 3 4 5 1 3 4 2 5 1 3 4 2 5

Forest and trees - Array Representation
{1,2,0,3}{4}{5} find(2) = 1 find(4) = 4 Array representation 1 4 5 2 3

Find Operation {1,2,0,3}{4}{5} find(0) = 1 3 -1 1 1 -1 -1
public int find(int x) { if (s[x] < 0) return x; return find(s[x]); } 1 4 5 2 3

Union Operation {1,2}{0,3}{4}{5} {1,2,0,3}{4}{5} union(0,2)
4 2 5 {1,2}{0,3}{4}{5} {1,2,0,3}{4}{5} union(0,2) before after public void union(int x, int y){ S[find(x)] = find(y) }

The problem Find must walk the path to the root
Unlucky combinations of unions can result in long paths 1 2 3 4 5 6

Path compression for find
find flattens trees Redirect nodes to point directly to the root Do this while traversing path from node to root. 1 3 4 2 5 1 4 5 2 3

Path compression find flattens trees
Redirect nodes to point directly to the root Do this while traversing path from node to root. public int find(int x) { if (s[x]< 0) return x; return s[x] = find(s[x]); }

Union by size 1 3 2 Union-by-size 4 Representational trick Performance
Join lesser size to greater Label with sum of sizes Find (with/without path comp.): No effect Representational trick Positive numbers: index of parent Negative numbers: root, with size -s[x] Performance When depth of a tree increases on union, it is always at least twice previous size. Hence maximum of log(N) steps that increase depth. Hence maximum time for find is O(log(N)). 4 1 3 2

union by height union shallow trees into deep trees
Tree depth increases only when depths equal Track path length to root Tree depth at most O(log N) 3 1 1 3 4 2 5 1

Union by height, details
Different heights Join lesser height to greater Do not change height values Equal heights Join either tree to the other Add one to height of result Find: Without path compression No effect With path compression Must recalculate height Can involve looking at many subtrees 1 3 2 2

Union by rank Path compression is easy to implement when we use union-by-size. However, union-by-height is problematic with path compression Definition Rank of a node is initialized to 0 Updated only during union operation Union-by-rank Union: Different ranks Join lesser rank to greater Do not change rank value Equal ranks Join either to the other Add one to rank of result Find, with path compression Yields good performance

All the code class UnionFind { int[] u; UnionFind(int n) {
u = new int[n]; for (int i = 0; i < n; i++) u[i] = -1; } int find(int i) { int j,root; for (j = i; u[j] >= 0; j = u[j]) ; root = j; while (u[i] >= 0) { j = u[i]; u[i] = root; i = j; } return root; void union(int i,int j) { i = find(i); j = find(j); if (i !=j) { if (u[i] < u[j]) { u[i] += u[j]; u[j] = i; } else { u[j] += u[i]; u[i] = j; }

The UnionFind class class UnionFind { int[] u; UnionFind(int n) {
u = new int[n]; for (int i = 0; i < n; i++) u[i] = -1; } int find(int i) { ... } void union(int i,int j) { ... }

Iterative find int find(int i) { int j, root;
for (j = i; u[j] >= 0; j = u[j]) ; root = j; while (u[i] >= 0) { j = u[i]; u[i] = root; i = j; } return root; }

union by size i = find(i); j = find(j); if (i != j) {
void union(int i,int j) { i = find(i); j = find(j); if (i != j) { if (u[i] < u[j]) { u[i] += u[j]; u[j] = i; } else { u[j] += u[i]; u[i] = j; } }

Analysis of UnionFind

Analysis of Union-Find
The algorithm Union: by rank Find: with path compression 3 1 1 2 2 3 4 5 1 6

Analysis - Rank tree size
Lemma. After a sequence of union instructions, a node of rank r will have at least 2r descendents, including itself. Proof. r = = 1. r > 0. Let T be the smallest rank-r tree and X be its root. Suppose T was result of union(T1, T2) and X was root of T1. The ranks of T1 and T2 must both be r-1. If rank of Ti were r then T could not be smallest rank-r tree. Also, since the union increased rank, the Ti ranks must be equal. By induction hypothesis, each Ti has at least 2r-1 descendents. Total must therefore be at least 2r. Note on path compression Path compression doesn’t affect rank Though it does affect height!

Analysis - Nodes of rank r
Lemma. The number of nodes of rank r is at most N/2r. Proof. Each node of rank r roots a subtree of at least 2r nodes. No node within the subtree can be of rank r. So all subtrees of rank r are disjoint. At most N/2r subtrees. Examples: rank 0: at most N subtrees (i.e., every node is a root). rank log(N): at most 1 subtree (of size N).

Analysis - Ranks on a path
Lemma. Node rank always increases from leaf to root. Proof. Obvious if no path compression. With path compression, nodes are promoted from lower levels and hence were of lesser rank.

Time bounds Variables M operations. N elements. Algorithms
Simple forest representation Worst: find O(N). mixed operations O(MN). Average: tricky Union by height; Union by size Worst: find O(log N). mixed operations O(M log N). Average: mixed operations O(M) Path compression in find Worst: mixed operations: “nearly linear” [analysis in ]

MST, Topological Sort and Disjoint Sets

Similar presentations

Presentation on theme: "MST, Topological Sort and Disjoint Sets"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

MST, Topological Sort and Disjoint Sets

Similar presentations

Presentation on theme: "MST, Topological Sort and Disjoint Sets"— Presentation transcript:

Similar presentations

About project

Feedback