Andreas Klappenecker [Based on slides by Prof. Welch]

Slides:



Advertisements
Similar presentations
Lecture 15. Graph Algorithms
Advertisements

Advanced Algorithm Design and Analysis Jiaheng Lu Renmin University of China
1 Disjoint Sets Set = a collection of (distinguishable) elements Two sets are disjoint if they have no common elements Disjoint-set data structure: –maintains.
1 Introduction to Algorithms 6.046J/18.401J/SMA5503 Lecture 20 Prof. Erik Demaine.
Minimum Spanning Tree (MST) form a tree that connects all the vertices (spanning tree). minimize the total edge weight of the spanning tree. Problem Select.
© 2004 Goodrich, Tamassia Union-Find1 Union-Find Partition Structures.
Disjoint-Set Operation
Discussion #36 Spanning Trees
CSE 780 Algorithms Advanced Algorithms Minimum spanning tree Generic algorithm Kruskal’s algorithm Prim’s algorithm.
Minimum Spanning Trees Definition Algorithms –Prim –Kruskal Proofs of correctness.
CPSC 411, Fall 2008: Set 7 1 CPSC 411 Design and Analysis of Algorithms Set 7: Disjoint Sets Prof. Jennifer Welch Fall 2008.
CPSC 311, Fall CPSC 311 Analysis of Algorithms Disjoint Sets Prof. Jennifer Welch Fall 2009.
Analysis of Algorithms CS 477/677
© 2004 Goodrich, Tamassia Union-Find1 Union-Find Partition Structures.
Dynamic Sets and Data Structures Over the course of an algorithm’s execution, an algorithm may maintain a dynamic set of objects The algorithm will perform.
Data Structures, Spring 2004 © L. Joskowicz 1 Data Structures – LECTURE 17 Union-Find on disjoint sets Motivation Linked list representation Tree representation.
Lecture 16: Union and Find for Disjoint Data Sets Shang-Hua Teng.
CSE 373, Copyright S. Tanimoto, 2002 Up-trees - 1 Up-Trees Review of the UNION-FIND ADT Straight implementation with Up-Trees Path compression Worst-case.
CS2420: Lecture 42 Vladimir Kulyukin Computer Science Department Utah State University.
Kruskal’s algorithm for MST and Special Data Structures: Disjoint Sets
MA/CSSE 473 Day 36 Kruskal proof recap Prim Data Structures and detailed algorithm.
Definition: Given an undirected graph G = (V, E), a spanning tree of G is any subgraph of G that is a tree Minimum Spanning Trees (Ch. 23) abc d f e gh.
Minimum Spanning Trees Definition of MST Generic MST algorithm Kruskal's algorithm Prim's algorithm Binary Search Trees1.
2IL05 Data Structures Fall 2007 Lecture 13: Minimum Spanning Trees.
Spring 2015 Lecture 11: Minimum Spanning Trees
D ESIGN & A NALYSIS OF A LGORITHM 06 – D ISJOINT S ETS Informatics Department Parahyangan Catholic University.
Computer Algorithms Submitted by: Rishi Jethwa Suvarna Angal.
CS 473Lecture X1 CS473-Algorithms I Lecture X1 Properties of Ranks.
Homework remarking requests BEFORE submitting a remarking request: a)read and understand our solution set (which is posted on the course web site) b)read.
Disjoint Sets Data Structure (Chap. 21) A disjoint-set is a collection  ={S 1, S 2,…, S k } of distinct dynamic sets. Each set is identified by a member.
Lecture X Disjoint Set Operations
Disjoint Sets Data Structure. Disjoint Sets Some applications require maintaining a collection of disjoint sets. A Disjoint set S is a collection of sets.
Union-find Algorithm Presented by Michael Cassarino.
Union Find ADT Data type for disjoint sets: makeSet(x): Given an element x create a singleton set that contains only this element. Return a locator/handle.
David Luebke 1 12/12/2015 CS 332: Algorithms Amortized Analysis.
CSCE 411H Design and Analysis of Algorithms Set 7: Disjoint Sets Prof. Evdokia Nikolova* Spring 2013 CSCE 411H, Spring 2013: Set 7 1 * Slides adapted from.
Disjoint-Set Operation. p2. Disjoint Set Operations : MAKE-SET(x) : Create new set {x} with representative x. UNION(x,y) : x and y are elements of two.
Union-Find A data structure for maintaining a collection of disjoint sets Course: Data Structures Lecturers: Haim Kaplan and Uri Zwick January 2014.
Union-Find  Application in Kruskal’s Algorithm  Optimizing Union and Find Methods.
MA/CSSE 473 Days Answers to student questions Prim's Algorithm details and data structures Kruskal details.
Chapter 21 Data Structures for Disjoint Sets Lee, Hsiu-Hui Ack: This presentation is based on the lecture slides from Prof. Tsai, Shi-Chun as well as various.
WEEK 5 The Disjoint Set Class Ch CE222 Dr. Senem Kumova Metin
CSE Advanced Algorithms Instructor : Gautam Das Submitted by Raja Rajeshwari Anugula & Srujana Tiruveedhi.
David Luebke 1 3/1/2016 CS 332: Algorithms Dijkstra’s Algorithm Disjoint-Set Union.
21. Data Structures for Disjoint Sets Heejin Park College of Information and Communications Hanyang University.
Lecture 12 Algorithm Analysis Arne Kutzner Hanyang University / Seoul Korea.
MA/CSSE 473 Day 37 Student Questions Kruskal Data Structures and detailed algorithm Disjoint Set ADT 6,8:15.
CSE 373, Copyright S. Tanimoto, 2001 Up-trees - 1 Up-Trees Review of the UNION-FIND ADT Straight implementation with Up-Trees Path compression Worst-case.
CSE 373: Data Structures and Algorithms
Disjoint Sets Data Structure
CSE 373, Copyright S. Tanimoto, 2001 Up-trees -
CMSC 341 Disjoint Sets Based on slides from previous iterations of this course.
CSE373: Data Structures & Algorithms Lecture 11: Implementing Union-Find Linda Shapiro Spring 2016.
Data Structures and Algorithms
Disjoint Sets with Arrays
Disjoint Set Neil Tang 02/23/2010
Disjoint Set Neil Tang 02/26/2008
CS200: Algorithm Analysis
CSE 373 Data Structures and Algorithms
CSCE 411 Design and Analysis of Algorithms
CSE 373: Data Structures and Algorithms
CSE 373: Data Structures and Algorithms
Union-Find Partition Structures
Union-Find Partition Structures
Lecture 20: Disjoint Sets
Kruskal’s algorithm for MST and Special Data Structures: Disjoint Sets
Disjoint Sets Data Structure (Chap. 21)
Lecture 21 Amortized Analysis
Review of MST Algorithms Disjoint-Set Union Amortized Analysis
CSE 373: Data Structures and Algorithms
Presentation transcript:

Andreas Klappenecker [Based on slides by Prof. Welch] Disjoint Sets Andreas Klappenecker [Based on slides by Prof. Welch]

Dynamic Sets In mathematics, a set is understood as a collection of clearly distinguishable entities (called elements). Once defined, the set does not change. In computer science, a dynamic set is understood to be a set that can change over time by adding or removing elements.

Abstract Data Type: Disjoint Sets State: collection of disjoint dynamic sets The number of sets and their composition can change, but they must always be disjoint. Each set has a representative element that serves as the name of the set. For example, S={a,b,c} can be represented by a. Operations: Make-Set(x): creates singleton set {x} and adds it the to collection of sets. Union(x,y): replaces x's set Sx and y's set Sy with Sx U Sy Find-Set(x): returns (a pointer to) the representative of the set containing x

Disjoint Sets Example Make-Set(a) Make-Set(b) Make-Set(c ) Make-Set(d) Union(a,b) Union(c,d) Find-Set(b) Find-Set(d) Union(b,d) b a c d returns a returns c

Example The Disjoint Sets data structure can be used in Kruskal’s MST algorithm to test whether adding an edge to a set of edges would cause a cycle. Recall that in Kruskal’s algorithm the edges chosen so far form a forest. The basic idea is to represent each connected component of this forest by its set of vertices. In other words, each dynamic set represents a tree.

Example (continued) Initially, form the singleton sets {v1}, …, {vn}. For all vertices v in V do Make-Set(v). od. Add an edge unless this would form a cycle: For all edges e=(u,v) taken in increasing weight do Union(u,v) unless Find-Set(u)=Find-Set(v).

Disjoint Sets in Linked List Representation

Dynamic Sets in Linked List Representation Idea: Store the set elements in a linked list. Each list node has a pointer to the next list node The first list node is the set representative rep. Each list node also has a pointer to the set representative rep. Keep external pointers to first list node (rep) and last list node (tail)

Linked List Representation tail rep tail rep tail a d e b f c

Linked List Representation Make-Set(x): make a new linked list containing just a node for x O(1) time Find-Set(x): given (pointer to) linked list node containing x, follow rep pointer to head of list Union(x,y): append list containing x to end of list containing y and update all rep pointers in the old list of x to point to the rep of y. O(size of x's old list) time

Time Analysis What is worst-case time for any sequence of Disjoint Set operations, using the linked list representation? Let m be number of operations in the sequence Let n be number of Make-Set operations (i.e., number of elements)

Expensive Case Consider the sequence of operations: Make-Set(x1), Make-Set(x2), …, Make-Set(xn), Union(x1,x2), Union(x2,x3), …, Union(xn-1,xn) This sequences contains m=2n-1 operations. The total time these m operations is O(n2), since unions update 1,2,…,n-1 elements. We have O(n2)=O(m2) since m = 2n – 1. Thus, the amortized time per operation is O(m2)/m = O(m).

Partial Remedy: Linked List with Weighted Union Always append smaller list to larger list Need to keep count of number of elements in list (weight) in rep node We will now calculate the worst-case time for a sequence of m operations. Clearly, the Make-Set and Find-Set operations contribute O(m) total. The Union operations are more critical.

Analyzing Time for All Unions How many times must the rep pointer for an arbitrary node x be updated? The first time the rep pointer of x is updated, the new set has at least 2 elements. The second time rep pointer of x is updated, the new set has at least 4 elements. [Indeed, the set containing x has at least 2 elements and the other set is at least as large as the set containing x.]

Analyzing Time for All Unions The maximum size of a set is n (the number of Make-Set ops in the sequence) So the rep pointer of x may be updated at most log n times. Thus total time for all unions is O(n log n). Note the style of counting - focus on one element and how it fares over all the Unions

Amortized Time Grand total for sequence is O(m+n log n) Amortized cost per Make-Set and Find-Set is O(1) Amortized cost per Union is O(log n) since there can be at most n - 1 Union ops.

Disjoint Sets in Tree Representation

Tree Representation Can we improve on the linked list with weighted union representation? Use a collection of trees, one per set The rep is the root Each node has a pointer to its parent in the tree

Tree Representation a d e b c f

Analysis of Tree Implementation Make-Set: make a tree with one node O(1) time Find-Set: follow parent pointers to root O(h) time where h is height of tree Union(x,y): make the root of x's tree a child of the root of y's tree So far, no better than original linked list implementation

Improved Tree Implementation Use a weighted union, so that smaller tree becomes child of larger tree prevents long chains from developing can show this gives O(m log n) time for a sequence of m ops with n Make-Sets Also do path compression during Find-Set flattens out trees even more can show this gives O(m log*n) time!

Interlude: What is log*n ? The number of times you can successively take the log, starting with n, before reaching a number that is at most 1 More formally: log*n = min{i ≥ 0 : log(i)n ≤ 1} where log(i)n = n, if i = 0, and otherwise log(i)n = log(log(i-1)n)

Examples of log*n n log*n why 0 < 1 1 1 ≤ 1 2 log 2 = 1 3 0 < 1 1 1 ≤ 1 2 log 2 = 1 3 log 3 > 1, log(log 3) < 1 4 log(log 4) = log 2 = 1 5 log(log 5) > 1, log(log(log 5)) < 1 … 16 log(log(log 16) = log(log 4) = log 2 = 1 17 log(log(log 17) > 1, log(log(log(log 17))) < 1 65,536 log(log(log(log 65,536) = log(log(log 16) = log(log 4) = log 2 = 1 65,537 2 65,537

log*n Grows Slowly For all practical values of n, log*n is never more than 5.

Make-Set Make-Set(x): parent(x) := x rank(x) := 0 // used for weighted union

Union Union(x,y): r := Find-Set(x); s := Find-Set(y) if rank(r) > rank(s) then parent(s) := r else parent(r) := s if rank(r ) = rank(s) then rank(s)++

Rank gives upper bound on height of tree is approximately the log of the number of nodes in the tree Example: MS(a), MS(b), MS(c), MS(d), MS(e), MS(f), U(a,b), U(c,d), U(e,f), U(a,c), U(a,e)

End Result of Rank Example 2 d b a f e 1 1 c

Find-Set Find-Set(x): Unroll recursion: if x  parent(x) then parent(x) := Find-Set(parent(x)) return parent(x) Unroll recursion: first, follow parent pointers up the tree then go back down the path, making every node on the path a child of the root

Find-Set(a) e d c e d c b a b a

Amortized Analysis Show any sequence of m Disjoint Set operations, n of which are Make-Sets, takes O(m log*n) time with the improved tree implementation. Use aggregate method. Assume Union always operates on roots otherwise analysis is only affected by a factor of 3

Charging Scheme Charge 1 unit for each Make-Set Charge 1 unit for each Union Set 1 unit of charge to be large enough to cover the actual cost of these constant-time operations

Charging Scheme Actual cost of Find-Set(x) is proportional to number of nodes on path from x to its root. Assess 1 unit of charge for each node in the path (make unit size big enough). Partition charges into 2 different piles: block charges and path charges

Overview of Analysis For each Find-Set, partition charges into block charges and path charges To calculate all the block charges, bound the number of block charges incurred by each Find-Set To calculate all the path charges, bound the number of path charges incurred by each node (over all the Find-Sets that it participates in)

Blocks Consider all possible ranks of nodes and group ranks into blocks Put rank r into block log*r ranks: 0 1 2 3 4 5 … 16 17 … 65536 65537 … blocks: 0 1 2 3 4 5

Charging Rule for Find-Set Fact: ranks of nodes along path from x to root are strictly increasing. Fact: block values along the path are non-decreasing Rule: root, child of root, and any node whose rank is in a different block than the rank of its parent is assessed a block charge each remaining node is assessed a path charge

Find-Set Charging Rule Figure 1 block charge (root) block b'' > b' 1 block charge 1 path charge … 1 path charge block b' > b … 1 block charge 1 path charge block b 1 block charge … 1 path charge

Total Block Charges Consider any Find-Set(x). Worst case is when every node on the path from x to the root is in a different block. Fact: There are at most log*n different blocks. So total cost per Find-Set is O(log*n) Total cost for all Find-Sets is O(m log*n)

Total Path Charges Consider a node x that is assessed a path charge during a Find-Set. Just before the Find-Set executes: x is not a root x is not a child of a root x is in same block as its parent As a result of the Find-Set executing: x gets a new parent due to the path compression

Total Path Charges x could be assessed another path charge in a subsequent Find-Set execution. However, x is only assessed a path charge if it's in same block as its parent Fact: A node's rank only increases while it is a root. Once it stops being a root, its rank, and thus its block, stay the same. Fact: Every time a node gets a new parent (because of path compression), new parent's rank is larger than old parent's rank

Total Path Charges So x will contribute path charges in multiple Find-Sets as long as it can be moved to a new parent in the same block Worst case is when x has lowest rank in its block and is successively moved to a parent with every higher rank in the block Thus x contributes at most M(b) path charges, where b is x's block and M(b) is the maximum rank in block b

Total Path Charges Fact: There are at most n/M(b) nodes in block b. Thus total path charges contributed by all nodes in block b is M(b)*n/M(b) = n. Since there are log*n different blocks, total path charges is O(n log*n), which is O(m log*n).

Even Better Bound By working even harder, and using a potential function analysis, can show that the worst-case time is O(m(n)), where  is a function that grows even more slowly than log*n: for all practical values of n, (n) is never more than 4.

Effect on Running Time of Kruskal's Algorithm |V| Make-Sets, one per node 2|E| Find-Sets, two per edge |V| - 1 Unions, since spanning tree has |V| - 1 edges in it So sequence of O(E) operations, |V| of which are Make-Sets Time for Disjoint Sets ops is O(E log*V) Dominated by time to sort the edges, which is O(E log E) = O(E log V).