Presentation on theme: "1 Disjoint Sets Set = a collection of (distinguishable) elements Two sets are disjoint if they have no common elements Disjoint-set data structure: –maintains."— Presentation transcript:
1 Disjoint Sets Set = a collection of (distinguishable) elements Two sets are disjoint if they have no common elements Disjoint-set data structure: –maintains a collection of disjoint sets –each set has a representative element –supported operations: MakeSet(x) Find(x) Union(x,y)
2 Disjoint Sets Used in applications requiring the partition of a set into equivalence classes. –Maze generation –Several graph algorithms (e.g. Kruskal's algorithm for minimum spanning trees) –Compiler algorithms Equivalence of finite automata
3 Disjoint Sets Major operations: –MakeSet(x) Given an object x, create a set out of it. The representative of the set is x –Find(x) Given an object x, return the representative of the set containing x –Union(x) Given elements x, y, merge the sets they belong to. The original sets are destroyed. The new set has a new representative
4 Disjoint Sets In the discussion that follows: –n is the total number of elements (in all sets) –m is the total number of operations performed (a mix of MakeSet, Union, Find operations) m is at least equal to n since there must be a MakeSet operation for each element. The maximum number of Union operations that may be performed is n-1. –We will perform amortized analysis.
5 Disjoint Sets Implementation #1: Using linked lists –The head of the list is also the representative –Each node contains an element a pointer to the next node a pointer to the representative –Why? Because this will speed up the Find operation
6 Disjoint Sets Implementation #1: Using linked lists –MakeSet(x) Create a list with one node, x Time for one operation: O(1) –Find(x) Assuming we already have a pointer to x (*), just return the pointer to the representative Time for one operation: O(1) (*) usually, we have a vector of pointers to the individual nodes
7 Disjoint Sets Implementation #1: Using linked lists –Union(x, y) Perform Find(x) to find x's representative, r x Perform Find(y) to find y's representative, r y Append r y 's list to the end of r x 's list r x becomes the representative of the new set. –The elements that used to be in r y 's list should have their pointers to the representative updated. »Idea #1: Do a lazy update: set r y 's pointer to r x and leave the rest the way they are. This will make Union faster but will slow down the Find operation. »Idea #2: Update all applicable pointers. This will maintain the constant Find() time.
8 Disjoint Sets Implementation #1: Using linked lists –Union(x, y) A sequence of m operations may take O(m+n 2 ) time –How? Given elements 1, 2, 3,..., n, do Union(1, 2), Union(3, 1), Union(4, 1), etc. At step i, we attach a list of length i to a list of length 1, thus updating i pointers to the new representative. After n- 1 unions, we'll have a single set and we will have performed O(n 2 ) pointer updates. So let's be smart about it: –Keep track of the length of each list and always append the shorter list to the longer one.
9 Disjoint Sets Implementation #1: Using linked lists –Union(x, y) A sequence of m operations where all unions append the shorter list to the longer one takes O(m+nlgn) time Why? Because with each union we attach a list of length i to a list of length at least i, thus doubling the length of the list. By the time we get a single set containing all elements, each element's pointer to the representative will have been updated lgn times, thus giving us a total of nlgn pointer updates.
10 Disjoint Sets Implementation #2: Using arrays –Maintain an array of size n –Cell i of the array holds the representative of the set containing i. –Similar to lists, simpler to implement if we know the number of elements in advance.
11 Disjoint Sets Implementation #3: Using trees –Each set is represented by a tree structure where every node has a pointer to its parent. This tree is called an up-tree –The root is the representative of the set –The elements are not in any particular order.
12 Disjoint Sets Implementation #3: Using trees –MakeSet(x) Create a tree containing only the root, x Time for one operation O(1) –Find(x) Follow the parent pointers to the root. Time for one operation: O(depth of node) –Could be up to O(n)
14 Disjoint Sets Implementation #3: Using trees –Union(x, y) Perform Find(x) to locate the representative of x, s x Perform Find(y) to locate the representative of y, s y Make s y a child of s x –Danger: if we are not smart about it, our tree may end up looking like a list 12 + 1 2 + 3 1 2 + 4 3 1 2 4 3
15 Disjoint Sets Implementation #3: Using trees –Union(x, y) Always make the smaller tree a child of the larger tree. How do we define "smaller"? –Heuristic #1: Union-by-weight »Smaller = fewer nodes »Store number of nodes at the representative »Add the two weights when performing a union –Heuristic #2: Union-by-height »Smaller = shorter »Store height at the representative »The height increases only when two trees of equal height are united.
16 Disjoint Sets Implementation #3: Using trees –Union(x, y) The height of a tree is at most logn+1 where n is the number of elements in the tree. We can do better than that! Optimizing the union through path compression. –Our goal is to minimize the height of the tree –Every time we perform a Find(z) operation, we make all nodes on the path from the root to z immediate children of the root. –When path compression is performed, a sequence of m operations takes O(mlgn).
17 Disjoint Sets Implementation #3: Using trees –Union(x, y) Path compression and union-by-weight can be performed at the same time. Path compression and union-by-height- can be performed at the same time. –It's more complex since path compression changes the height of the tree. –We usually prefer to estimate the height instead of computing it exactly. We then talk about union-by-rank with path compression. When we perform union-by-weight/height with path compression, a sequence of m operations is almost linear in m