 Lecture 10 Disjoint Set ADT.

Presentation on theme: "Lecture 10 Disjoint Set ADT."— Presentation transcript:

Lecture 10 Disjoint Set ADT

Preliminary Definitions
A set is a collection of objects. Set A is a subset of set B if all elements of A are in B. Subsets are sets Union of two sets A and B is a set C which consists of all elements in A and B Two sets are mutually disjoint if they do not have a common element.

A partition of a set is a collection of subsets such that
Union of all these subsets is the set itself Any two subsets are mutually disjoint S = {1,2,3,4}, A = {1,2}, B = {3,4}, C = {2,3,4}, D = {4} Is A, B a partition of S? Yes Is A, C partition of S? No Is A, D partition of S? No

Union and Find Operations
Operations on partitions. Union Need to form union of two different sets of a partition Find Need to find out which set an element belongs to

Every set in the partition has a number.
The numbers can be anything as long as different sets have distinct numbers. Find(a) returns the number of the set containing a. Can two different sets contain the same element? No, the sets in a partition are disjoint

Disjoint Set Data Structure
Every element has a number. Elements of a set are stored in a tree (not necessarily binary) The set is represented by the root of the tree. The number assigned to a set is the number of the root element.

B = {3, 4} 3 4 B is assigned number 3 Are the numbers distinct for different sets? No two sets have the same root as they are disjoint, thus they have distinct numbers

Find(a) returns the number of the root node of the tree containing a.
Find(4) returns? 3 3 4 Find(3) returns? 3 Union operation makes one tree sub-tree of another Root of one tree becomes child of the root of another.

B = {3, 4} A = {1,2} 3 4 1 2 Want to do A union B 1 2 3 4 We have:

Tree Representation Will use an array based representation, array S
Let the elements be 1,2,….N S[j] contains the number for the parent of j S[j] = 0 if j is the root. Initially all trees are singletons Trees build up with unions. Note that we don’t use any pointers here.

B = {3, 4} A = {1,2} 3 4 1 2 1 3 S Want to do A union B 1 2 3 4 We have: S 1 1 3

Pseudo Code for Find Find(a) { If S[a] = 0, return a; else Find(S[a]);
} O(N) Complexity?

Pseudo-Code for Union Union(root 1, root 2) { S[root2] = root1; }
Complexity? O(1)

More Efficient Union Will improve the worst case find complexity to log N When we do a union operation make the smaller tree a subtree of the bigger one Thus the root of the smaller subtree becomes a child of the root of the bigger one. A = {1,2,3} B = {4} Root of B is root of A after union, B is subtree of A Alternatively, union operation can be done by height as well Tree of lesser height is made subtree of the other. We consider only size here.

Array storage changes somewhat.
If j is a root, S[j] = - size of tree rooted at j If j is not a root, S[j] = parent of j Why is S[j] not equal to the size of tree j, if j is a root? Size of tree j is an integer, if S[j]=size of tree j and j is root, then it would look like root of j is another element, thus j is not root Initially, what is the content of array S? All elements are -1

Pseudo-Code for Union Union(root 1, root 2) {
If S[root2] < S[root1], S[root1] = root2; else S[root2]=root1; } Complexity? O(1)

Pseudo Code for Find Find(a) { If S[a] < 0, return a;
else Find(S[a]); return; }

Complexity Analysis for Find Operation
If the depth (distance from root) of a node A increases, then the earlier tree consisting the node A becomes a subtree of another. Since only a smaller tree becomes a subtree of another, total size of the combined tree must be at least twice the previous one consisting A. Each time depth of a node increases, the size of the tree increases by at least a factor of 2. At first every node has depth 0

Next time depth is 1, tree size is at least 2,
depth is k, tree size is at least 2k We know that 2k <= N Thus k <= log N Depth of any tree is at most log N Complexity of Find operation is O(log N) Complexity of any M operations is O(MlogN)

Path Compression Makes all operations almost linear in the worst case.
Whenever you do Find(j) make S[k]=Find(j) for all elements on the path of j to the root, except the root. All nodes on the path of j now point to the root directly

1 2 3 4 5 Do Find(5) Find(5) encounters 5, 4 and 3 before reaching root 1 After Find(5): 1 2 3 4 5

Later Find operations will have lower costs as their depths have been reduced.
Any Find operation reduces the cost of future ones.

Pseudo Code for New Find
Find(a) { If S[a] < 0, return a; else S[a]=Find(S[a]); return; }

Complexity Analysis Any M operations take O(Mlog*N) if M is (N)
log*N is the number of times we take loglog….logN so as to get a number less than or equal to 1 (log base 2, even otherwise asymptotic order remains the same). log*N grows very slowly with N and is less than 4 or 5 for all practical values of N, log* is less than 5 Thus the worst case complexity is linear for all practical purposes.

Reading Assignment Chapter 8, till section (i.e. section onwards can be omitted).