Nirmalya Roy School of Electrical Engineering and Computer Science Washington State University Cpt S 223 – Advanced Data Structures Disjoint Sets.

Nirmalya Roy School of Electrical Engineering and Computer Science Washington State University Cpt S 223 – Advanced Data Structures Disjoint Sets

Preliminaries on Set In mathematics, two sets are said to be disjoint  If they have no element in common  {1, 2, 3} and {4, 5, 6} are disjoint sets Formally, two sets A and B are disjoint if their intersection is the empty set, i.e., if  The disjoint set definition extends to any collection of sets  A collection of sets is pairwise disjoint or mutually disjoint if, given any two sets in the collection, those two sets are disjoint

Preliminaries on Set (cont’d) For example, the collection of sets { {1}, {2}, {3},... } is pairwise disjoint  {A i } is a pairwise disjoint collection (containing at least two sets), then clearly its intersection is empty However, the converse is not true:  The intersection of the collection {{1, 2}, {2, 3}, {3, 1}} is empty, i.e., disjoint  The collection is not pairwise disjoint  In fact, there are no two disjoint sets in this collection A partition of a set X is any collection of non-empty subsets {A i : i ∈ I} of X such that {A i } are pairwise disjoint and

Example A ={3}; B= {1, 3, 5} and C= {4, 5, 6} Are A, B and C disjoint?  Yes Are A, B and C pairwise disjoint?  No  The intersection of A and B is not empty  The intersection of B and C is not empty

Disjoint Sets Data structure for problems requiring equivalence relations  That is, are two elements in the same equivalence class Disjoint sets provide a simple, fast solution  Simple: Array-based implementation  Fast: O(1) per operation average-case Analysis is challenging

Relation and Equivalence Relation An relation R is defined on a set S, if for every pair of elements (a, b) where a, b in S  (a R b) is either true or false An equivalence relation is a relation R that satisfies three properties  (Reflexive) (a R a), for each element a in S  (Symmetric) (a R b) if and only if (b R a)  (Transitive) (a R b) and (b R c) implies (a R c) The equivalence class of an element a (in S) is the subset of S that contains all elements related to a

Equivalence Relation (cont’d) Examples  Equality (=) over integers (Reflexive) (a = a) for all integers a (Symmetric) (a = b) iff (b = a) (Transitive) (a = b) and (b = c) implies (a = c)  Electrical connectivity (Reflexive) A component is connected to itself (Symmetric) If a is electrically connected to b, then b is electrically connected to a (Transitive) If a is connected to b and b is connected to c, then a is connected to c  Cities belonging to the same country (if all the roads are two-way)

Equivalence Relation (cont’d) Not an equivalence relation   (Reflexive) (a  a), for all a (Transitive) If (a  b) and (b  c) then (a  c) (NOT Symmetric) (a  b) DOES NOT IMPLY (b  a)

Equivalence Class Given a set S and equivalence relation R Find the subsets S i of S such that  For all a, b  S i : (a R b)  For all a  S i, b  S j, i  j: not(a R b) These S i are the equivalence classes of S for relation R  The S i are “disjoint sets” and form a partition of S Example: S = {1,2,3,4,3,3,2,1,3}, R is the equality (=)  Find out the equivalence classes

A Motivating Problem for Disjoint Sets Given a set S of n elements [a 1, …, a n ], compute all the equivalent class of all its elements

Properties of Equivalence Classes OBSERVATION:  Each element has to belong to exactly one equivalence class COROLLARY:  All equivalence classes are mutually disjoint What we are after is the set of all “maximal” equivalence classes

Disjoint Set Operations To identify all equivalence classes 1. Initially, put each element in a set of its own 2. Permit only two types of operations:  find(x): Returns the equivalence class of x  union(x, y): Merges the equivalence classes corresponding to elements x and y, if and only if x is “related” to y

Steps in the union(x, y) 1. EqClass x = find(x) 2. EqClass y = find(y) 3. EqClass xy = EqClass x  EqClass y Union if x is “related” to y

A Naïve Algorithm for Equivalence Class Computation 1. Initially, put each element in a set of its own  That is, EqClass a = {x}, for every x  S. 2. For each element pair (x, y) 1. Check if (x R y) is true 2. If (x R y) is true then 1. EqClass x = find(x) 2. EqClass y = find(y) 3. EqClass xy = EqClass x  EqClass y union(x, y)

Disjoint Sets Example: S = {1,2,3,4,3,3,2,1,3}, R is the equality (=)  S = {1 a, 2 a, 3 a, 4 a, 3 b, 3 c, 2 b, 1 b, 3 d }  DS = { {1 a }, {2 a }, {3 a }, {4 a }, {3 b }, {3 c }, {2 b }, {1 b }, {3 d } }  3 a R 3 b ?, 3 c R 3 d ?  DS = { {1 a }, {2 a }, {3 a,3 b }, {4 a }, {3 c,3 d }, {2 b }, {1 b } }  3 a R 3 c ?  DS = { {1 a }, {2 a }, {3 a,3 b,3 c,3 d }, {4 a }, {2 b }, {1 b } }  Continue like this…

Specification for Union-Find find(x)  Should return the ID of the equivalence class that currently has element x union(x, y)  If x and y are in two different equivalence classes, then union(x, y) should merge those two sets into one Otherwise, no change

Disjoint Sets Represent each set as a tree Tree’s root is the representative element for the set Disjoint sets are a forest of trees find(x) returns the root element of the tree containing x union(x, y) points root node of tree containing y to root node of tree containing x Implemented as array S, where S[i] = index of parent node in tree (or -1 if root)

Example Initial disjoint sets of 8 elements (really an array of size 8 of all -1) Initially, each element is put in one set of its own (start with n sets == n trees) After union(4, 5): points root node of tree containing 5 to root node of tree containing 4

Example (cont’d) After union(6, 7): After union(4, 6): Convention is that new root after union (x,y) is x The array representation:

How to Support union() and find() Operations Efficiently? Approach 1  Keep the elements in the form of an array A, where A[i] stores the ID of the equivalence set it is currently in Analysis  find(x) will return the root Generally constant time O(depth of the node X) Worst case running time is O(n) since depth is n-1  union(x, y) could take up to O(n) time Assuming x in class a and y in class b Scan array, changing all a’s to b’s (class label)  Therefore, a sequence of m operations could take O(mn) in the worst case

How to Support union() and find() Operations Efficiently? (cont’d) Approach 2  Keep all equivalence sets in separate linked list Analysis  Decreases time for unions by not having to search all N elements Just the two lists where the elements are found And then concatenate lists: O(size of larger list) union(x, y) now needs only O(1) time (assume doubly-linked list)  Increases time to find an element find(x) could take up O(n) time Slight improvements are possible (think of BSTs)  A sequence of m operations will take  (m log n)

How to Support union() and find() Operations Efficiently? (cont’d) Approach 3  Keep all equivalence classes in separate trees  Ensure (somehow) that find() and union() take << O(log n) time  This is the data structure we have used for disjoint sets A disjoint sets for n elements is a forest of k trees, where 1  k  n

The Disjoint Set Data Structure Purpose: To support two basic operations efficiently:  find(x)  union(x, y) Input: An array of n elements Identify each element by its array index  Element label is equal to array index  Value does not really matter

Implementation Note: This will always be a vector, regardless of the data type of your elements. Initial # of disjoint sets

Implementation (cont’d) The array representation: Entry s[i] “points” to ith parent. s[i] == -1 means i is root.

Implementation (cont’d)

Set Name 3 Set Name 4

Implementation (cont’d) Union performed arbitrarily. This could also be s[root1] = root2; Set Name 4 Set Name 6

Implementation (cont’d) // a & b could be arbitrary elements (need not be roots) void DisjSets::union(int a, int b) { unionSets(find(a), find(b)); }

Analysis of the Simple Version Each find(x) could take O(n) time  Proportional to depth of tree containing x Each union(x, y) could also take O(n) time Each unionSets() takes only O(1) time in the worst- case Therefore, m operations, where m >> n, would take O(mn) time in the worst-case

Smart Union Algorithms Problem with the arbitrary union strategy in the simple approach is that:  The tree, in the worst-case, could just grow along one long O(n) path Solution: Prevent formation of such long chains  Enforce union() to happen in a “balanced” way Two heuristics  Union-by-size  Union-by-height

Union-by-Size Attach the root of the “smaller” tree to the root of the “larger” tree w.r. t size union(3, 7) So, attach root 3 to root 4. size = 1 size = 4

Union-by-Size (cont’d) Result of union(3, 7) using union-by-size heuristic Result of union(3, 7) using simple union Using simple union could end up unbalanced like this. Size is 5 Size is 1

Union-by-Size (cont’d) Link smaller tree to larger tree Sequence of m operations requires O(m) time  Random unions tend to merge large sets with small sets  Thus, only increase depth of smaller set Implementation  Use “– size” instead of “-1” for root entries 4-5446 01234567 The array representation: Size is 1 Size is 5

Union-by-Height Attach the root of the “shallower” tree to the root of the “deeper” tree union(3, 7)So, attach root 3 to root 4. height = 0 height = 2

Union-by-Height (cont’d) Need to keep track of the height of each tree, rather than size Link smaller-height tree to larger-height tree Height only increases when two equal-height trees are unioned O(log n) maximum depth O(m) time for m operations Implementation  Store negative of height minus 1 for root entries

Union-by-Height (cont’d) 4-3446 01234567 The array representation: Height is 0 Height is 2

Union-By-Height Implementation

Analysis of Smart Union Heuristics For smart union (by rank or by size)  find() takes O(log n) time  union() takes O(log n) time  unionSets() takes O(1) time For m operations: O(m log n) time Can we do better?  What is still causing the (log n) factor is the distance of the nodes from the root  Idea: Get the nodes as close as possible to the root  The above idea is called path compression

Path Compression During find(x) operation  Update all the nodes along the path from x to the root point directly to the root  A two-pass algorithm  For example: find(x) root x 1 st pass 2 nd pass

Path Compression Example After find(14):

Path Compression Implementation s[x] is made equal to the value returned by find(x) X’s parent link references to the root of the set Occurs recursively to the every node on the path to the root

Path Compression with Smart Union Path compression works as is with union-by-size (tree sizes don’t change) Path compression with union-by-height requires re- computation of heights Solution: Don’t re-compute the heights  Heights become (possibly over) estimates of true height  Also called “ranks” and this solution is called “union-by-rank”  Ranks are modified far less than sizes, so slightly faster in practice Path compression does not change average case time, but does reduce worst-case time

Analysis of Union-by-Rank and Path Compression Worst case is Θ(M  (M,N))  M is number of operations (find, union)  N is number of elements in disjoint set   (M,N) is the inverse of Ackermann’s function In practice,  (M,N) ≤ 4 Thus, worst case is Θ(M) for M operations

Ackermann’s Function A(i,j)j=1j=2j=3j=4 i=12 1 = 22 2 = 42 3 = 82 4 = 16 i=22 2 = 42 2 2 = 162 16 = 655362 65536 i=32 2 2 = 162 16 = 655362 65536 2 2 65536 = BIG

Inverse of Ackermann’s Function

Analysis of Union-by-Rank and Path Compression Worst case is Θ(M  (M,N)) for M operations on disjoint set with N elements  But, technically not linear in M Any sequence of M = Ω(N) union/find operations takes O(M log*N) time

Heuristics and Their Gains Worst-case run-time for m operations Arbitrary union, simple findO(mn) Union-by-size, simple findO(m log n) Union-by-rank, simple findO(m log n) Arbitrary union, path compression find O(m log n) Union-by-rank, path compression find O(m log* n) log* n is pronounced “log star n”. log* n = log log log log … n log* n = how many times we have to repeatedly take log on n to make the value to 1? E.g. log* 65536 = 4, log* 2 65536 = 5 Therefore, log* n is an extremely slow growing f’n.

Application: Maze Generation Start with walls everywhere except upper-left corner and lower- right corner Randomly choose a wall that separates two disconnected cells  Knock it down if the cells that the wall separates are not already connected Continue until start (upper-left corner) and finish (lower-right corner) cells are connected Or, continue until all cells are connected  More dead ends Then we have a maze This is a 50-by-88 maze and top left-cell is connected to the bottom-right cell and cells are separated from their neighboring cells via wall.

Maze Generation Example Initial state: All walls up, all cells in their own set/equivalence class Use Union/Find to represent sets of cells that are connected to each other

Maze Generation Example (cont’d) Intermediate state: Few walls are knocked down Wall that connects cells {8} and {13} are randomly targeted {18} and {13} randomly targeted, perform two find operations Knock down the wall that separates them Combine via a union operation

Maze Generation Example (cont’d) After joining 13 and 18 from previous intermediate state:

Maze Generation Example (cont’d) Final state: All cells are connected

More Applications Finding the connected components of an undirected graph Computing shorelines of a terrain Molecular identification from fragmentation Image processing  Movie coloring H O C O O O

Summary Disjoint sets data structure provides simple, fast solution to equivalence problems  Array-based implementation  Average case O(1) time per operation Consider alternatives when a particular step is not totally specified  Simple Union, Union-by-Size, Union-by-Rank  Flexibility of Union helps to design efficient algorithms Path compression is one of the earliest forms of self- adjustment like splay trees, skew heaps  Simple algorithm with a not-so-simple worst case analysis

Nirmalya Roy School of Electrical Engineering and Computer Science Washington State University Cpt S 223 – Advanced Data Structures Disjoint Sets.

Similar presentations

Presentation on theme: "Nirmalya Roy School of Electrical Engineering and Computer Science Washington State University Cpt S 223 – Advanced Data Structures Disjoint Sets."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Nirmalya Roy School of Electrical Engineering and Computer Science Washington State University Cpt S 223 – Advanced Data Structures Disjoint Sets.

Similar presentations

Presentation on theme: "Nirmalya Roy School of Electrical Engineering and Computer Science Washington State University Cpt S 223 – Advanced Data Structures Disjoint Sets."— Presentation transcript:

Similar presentations

About project

Feedback