15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 21 April 2005 Equivalence and Union-Find.

15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 21 April 2005 Equivalence and Union-Find

Announcements  HW6 make sure to get your games in...  Reading: Chapter 1  FCEs  Final Exam on Thursday May 5, 8:30- 11:30 am  review session April 28

Chameleon Island

On a tropical island there are three kinds of chameleons perambulating themselves: red, green and blue. If a red and green chameleon meet, they both change color to blue, likewise for red/blue and green/blue. Initially there are 12 red, 13 green and 14 blue chameleons. Can the chameleons turn into a homogeneous population?

Brute Force We can compute this to death: use a digraph with nodes (r,g,b) and edges (r,g,b)  (r-1,g-1,b+2) (r,g,b)  (r-1,g+2,b-1) (r,g,b)  (r+2,g-1,b-1) provided that the numbers are non-negative. How many nodes are there? The starting configuration is (12,13,14) so the total number of animals is n = 39.

Reachability The number of nodes is C(39+2,2) = 820. We can simply use DFS or BFS to compute the nodes reachable from (12,13,14) and check if we run into one of (39,0,0), (0,39,0), (0,0,39). It turns out, we don't. OK but rather crude. Is there a more elegant solution? How about invariants?

Invariants If we suspect that some configuration cannot occur, we can try to prove this by finding some property P such that: - P holds on the initial configuration, - P is preserved in every single transition of the system, - P does not hold on the specific target configuration. Your favorite method: Induction.

Information Hiding For the chameleons, the key observation is that modulo 3 the three types of edges are all the same: (r,g,b)  (r+2,g+2,b+2) mod 3 Note that this quotient operation preserves paths, so it suffices to observe (0,1,2)  (12,13,14) mod 3 and (0,1,2)  (2,0,1)  (1,2,0)  (0,1,2) Of course, we lose a lot of information but this is enough to answer the original question.

Equivalence Relations There is an important idea hiding here: identify objects that are distinct but share some property. Definition: A binary relation ~ on a set A is an equivalence relation iff, for x, y, z in A, ~ is reflexivex ~ x symmetricx ~ y  y ~ x transitivex ~ y  y ~ z  x ~ z

Equivalence relations? <  not reflexive, not symmetric, transitive <=  reflexive, not symmetric, transitive e 1 = O(e 2 )  reflexive, not symmetric, transitive Reachable in a directed graph  reflexive, not symmetric, transitive   not reflexive, symmetric, not transitive

Examples Equivalent classes formalize the notion that two things are the “same”. = congruence modulo m polygons of same area people of same age reachable in a undirected graph programs with same input/output behavior words with the same meaning

Classes and Quotients Let ~ be an equivalence relation on the set A. For x in A the equivalence class of x w.r.t ~ is [x] = { y | x ~ y} (x and y belong to the same equivalence class iff x ~ y) The quotient is the set of all equivalence classes: A/~ = { [x] | x in A } ({[0],[1],[2],[3]} is the quotient of modulo 4) The index of ~ is the cardinality of A/~ (modulo 4 has index 4)

Partitions Lemma: If ~ is an equivalence relation on a set A then  x is in [x] for all x in A  [x] = [y] iff x ~ y  if [x]  [y] then [x]  [y] = Ø In fact, equivalence classes form a partition of A. Partitions and equivalence relations are essentially the same. x [x]

Examples Let A be the set of all cars and ~ be cars “having the same color”. An equivalence class is all green cars. The quotient is the set of all car colors. The index is the number of car colors. Let A be the set of integers pairs (x,y), y  0 and ~ is defined by (x,y) ~ (s,t) iff xt = ys. The equivalence class of (x,y) can be identified with the rational number x/y. (Hence the name quotient?)

Computing with Relations What data structures can represent equivalence relations? Brute force: Boolean matrix R such that R ij = true iff x i ~ x j What operations are interesting? Are x and y in the same equivalence class? Find the index. Find the intersection of two relations. Is there a better data structure than a matrix?

Kernel Relations Given any function f : A  B we can define a relation K(f) by x K(f) y iff f(x) = f(y). E.g., A is the set of all polygons f is the area of the polygon K(f) is having the same area x A B y z f

Kernel Relations Note that K(f) is always an equivalence relation. That is, the equivalence class of x is all elements of A that map to f(x). Note that [x] is the inverse image of f(x). If R = K(f) we say that f is a (kernel) representation for R. x A B y z f

Everybody is a Kernel Claim: Every equivalence relations has a kernel relation. In fact, we can choose a function f : A  A such that K(f) = ~. That is, we need x ~ y iff f(x) = f(y). This is intuitively clear: Let f map all x in an equivalence class to some special member of that class, sometimes called the representative of the class. x A y z

The Canonical Representation What is a good choice for the function f? f(x) = “smallest”( z | z ~ x ) For example, if x ~ y iff x = y mod 3 on {1,2,...,n} we get x 1 2 3 4 5 6 7 8 9 10 f(x) 1 2 3 1 2 3 1 2 3 1

Computational Aspects The last observation allows us to represent an equivalence relation on A = {1,2,...,n} compactly: Instead of n 2 bits for a Boolean matrix representation we only need n integers for an array representing f. Either way, we can check if two elements are equivalent in O(1) time.

Index x 1 2 3 4 5 6 7 8 9 10 f(x) 1 1 3 1 5 1 1 5 3 1 Question: How does one compute the index of R from a kernel representation for R?

Refinement Suppose we have two equivalence relations R and S on {1,2,...,n}. How do we compute their intersection? That is, find relation T=int(R,S) such that x and y are related if have both properties the same: x int(R,S) y iff x R y and x S y 4 8 2 3 6 7 5 1 int(R,S) 4 8 2 3 6 7 5 1 R 4 8 2 3 6 7 5 1 S

Intersection Example Suppose the two equivalence relations R and S are given given as n x n matrices. How do we compute T = int(R,S)? Now suppose both are given by their canonical kernel function. In other words, we want compute the canonical representation for T = int(R,S)? Example 1 2 3 4 5 6 7 8 R1 1 3 1 5 3 3 1 S1 1 3 3 5 5 5 1 T1 1 3 4 5 6 6 1

Code initialize H hashmap; for x = 1,...,n do if( (R[x],S[x]) is undefined ) then T[x] = H( (R[x],S[x]) ) = x; else T[x] = H( (R[x],S[x]) ) Expected linear time. Could also replace H by a n  n array (interesting if the initialization cost can be amortized).

Small Machines

Recall: Finite State Machines Recall that a finite state machine is essentially a lookup table with one entry for each symbol/state combination, plus an initial state and some final states.

An Experiment Think of the finite state machine as a black box. Suppose you can perform the following experiment as often as you wish: - reset the machine to some state p, - feed some string to the machine, and - observe whether the resulting state is final. Of course, you are not allowed to open up the machine. Which states could be distinguished from each other by this experiment?

A Black Box Call p and q (behaviorally) equivalent if they cannot be distinguished. Claim: 1.We can distinguish final from non-final states. 2. If we can distinguish p and q and d(p',a) = p and d(q',a) = q then we can also distinguish p' and q'.

Who Cares? If two states are equivalent, we may as well collapse them into a single state. More precisely, we can replace the state set Q by Q/~. The latter may be much smaller, so we can build potentially smaller machines. Fact: One can show that the smallest possible deterministic finite state machine (for a given language) can be obtained this way.

Example a b b a,b a b a

Computing Behavioral Equiv. How do we actually compute the behavioral equivalence relation ~? Refine partitions. Initially only distinguish between F and Q – F. Then refine the partition as follows: Suppose we have an equivalence relation E. Define E' by p E' q iff p E q and for all symbols s: d(p,s) E d(q,s).

Computing Behavioral Equiv. But that's just a intersection operation: Define p E s q iff d(p,s) E d(q,s). Then E' = int( E, E a, E b,... ). When E' = E for the first time we have E = ~. Can be computed in O( k n 2 ) steps where n is the number of states and k the number of input symbols.

Example 1 a b b a,b a 2 34 5 6 1 2 3 4 5 6 init1 1 3 3 1 1 a1 1 1 1 1 1 b3 3 1 1 1 1 1 1 3 3 5 5 a1 1 5 5 5 5 b3 3 5 5 5 5 1 1 3 3 5 5

Dynamic Equivalence Relations

The Party Problem You arrive at a party. As usual, there are separate groups of people standing around. In each group people talk to each other, but they don't talk to anyone outside of the group. You scan the groups, find someone that you know and join the corresponding group. If someone in another group knows you too, the two groups merge. How do we figure out the groups given a list of “is- friend-of” relations. The list is revealed step by step, we don't have access to the whole list from the start.

Dynamic E-Relations So far we have only dealt with static equivalence relations: the whole relation is given from the start and we can represent it by the canonical kernel function. Often that is not the case: all we have is knowledge about some equivalent pairs (x,y) of elements. The corresponding equivalence relation is thus given implicitely.

Recall: Mazes  Think about a grid of rooms separated by walls.  Each room can be given a name. abcd hgfe ijkl ponm Randomly knock out walls until we get a good maze.

Mathematical formulation A set of rooms: –{a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p} Pairs of adjacent rooms that have an open wall between them. –For example, (a,b) and (g,k) are pairs. abcd hgfe ijkl ponm

Mazes as graphs abcd hgfe ijkl ponm {(a,b), (b,c), (a,e), (e,i), (i,j), (f,j), (f,g), (g,h), (d,h), (g,k), (m,n), (n,o), (k,o), (o,p), (l,p)}

Mazes as graphs {(a,b), (b,c), (a,e), (e,i), (i,j), (f,j), (f,g), (g,h), (d,h), (g,k), (m,n), (n,o), (k,o), (o,p), (l,p)} abcd efgh ijkl mnop For a maze to have a unique solution, its graph must be a tree.

Mazes as trees A spanning tree is a tree that includes all of the nodes. –Why is it good to have a spanning tree? a b c d e f g h i j k lm n o p

Algorithm  Essentially:  Randomly pick a wall and delete it (add it to the tree) if it won’t create a cycle.  Stop when a spanning tree has been created.  This is Kruskal’s Algorithm.

Creating a spanning tree  When adding a wall to the tree, how do we detect that it won’t create a cycle?  When adding wall (x,y), we want to know if there is already a path from x to y in the tree.

Using the union-find algorithm We put rooms into an equivalence class if there is a path connecting them. Before adding an edge (x,y) to the tree, make sure that x and y are not in the same equivalence class. abcd efgh ijkl mnop Partially- constructed maze

Making equivalence dynamic  Dynamic Operations on an equivalence relation.  Example: adding edges to a graph  Example: removing walls in a maze  Operations UnionFind (n) constructor that creates n one-element sets find(i) returns the name of the set containing i. union(i,j) joins the sets containing i and j.  Effects  Calls to union can change future find results  Calls to find do not change future find results.

{1} {2} {3} {4} {5} {6} {7} Dynamic equivalence {1} {2,3} {4} {5} {6} {7} {1} {2,3,4} {5} {6} {7} {1} {2,3,4} {5,6} {7} {1} {2,3,4,5,6} {7}  Operations find(i) return the name of the set containing i. union(i,j) joins the sets containing i and j. union(2,3) union(3,4) union(5,6) union(6,3)

Union Find

Implementing Union-Find  A key question:  How should we represent the equivalence classes?  Let’s consider a naïve approach first, and then a better way…

Quick Find: array 1247 127  Array with set indexes 1 1 2 1 4 2 4 3 2 4 sets: {0,1,3}, {2,5,8}, {4,6}, {7}  union(1,4) yields: 1 1 2 1 1 2 1 3 2 3 sets: {0,1,3,4,6}, {2,5,8}, {7}

Running time for Quick Find  With this array representation, find(i) runs in O(1) time.  What about union(i,j)?  What is the runtime if we want to perform a sequence of N find and union operations?

Quick Find: linked list  We can get an improvement if we maintain each equivalence class in a linked list with each element having a pointer back to its representative.  How long does it take to perform a sequence of k unions? Best? Worse?  What if we maintain the size of the equivalence classes?

Quick Find: linked list Claim: Any sequence of m finds and k unions takes at most O(m + k log k) time.  Each find operation takes O(1) time.  Consider the sequence of k union operations. At most 2k elements can be updated.  How many times can element x be updated?  For every update, the equivalence class containing x at least doubles (if we always update the smaller equivalence class).  Thus a sequence of k union operations takes O(k log k).

Quick Union: forest of trees  Each equivalence class is a tree {1}{2}{0,3} {4}{5}  union(2,1) adds a new subtree to a root {1,2}{0,3}{4}{5}  union(0,1) adds a new subtree to a root {1,2,0,3}{4}{5}  demo 123 0 45 13 0 4 2 51 3 0 4 2 5

 {1,2,0,3}{4}{5}  find(2) = 1  find(4) = 4  Array representation 3 -1 1 1 -1 -1 0 1 2 3 4 5 Forest and trees: array repn 1 3 0 4 2 5

Find, v.0 1 3 0 4 2 5  {1,2,0,3}{4}{5}  find(0) = 1 s: 3 -1 1 1 -1 -1 0 1 2 3 4 5 public int find(int x) { }

Find, v.0 1 3 0 4 2 5  {1,2,0,3}{4}{5}  find(0) = 1 s: 3 -1 1 1 -1 -1 0 1 2 3 4 5 public int find(int x) { if (s[x] < 0) return x; return find(s[x]); }

Union, v.-1 13 0 4 2 51 3 0 4 2 5  {1,2}{0,3}{4}{5} {1,2,0,3}{4}{5}  union(0,2) s: 3 -1 1 -1 -1 -1 before s’: 3 -1 1 2 -1 -1after 0 1 2 3 4 5 public void union(int x, int y){ }

Union, v.-1 13 0 4 2 51 3 0 4 2 5  {1,2}{0,3}{4}{5} {1,2,0,3}{4}{5}  union(0,2) s: 3 -1 1 -1 -1 -1 before s’: 3 -1 1 2 -1 -1after 0 1 2 3 4 5 public void union(int x, int y){ s[find(x)] = y; }

Union, v.0 1 3 0 4 2 5  {1,2}{0,3}{4}{5} {1,2,0,3}{4}{5}  union(0,2) s: 3 -1 1 -1 -1 -1 before s’: 3 -1 1 1 -1 -1after 0 1 2 3 4 5 public void union(int x, int y){ s[find(x)] = find(y); } 13 0 4 2 5

Union v.0 is still O(n)!  Find must walk the path to the root  Unlucky combinations of unions can result in long paths 1 3 0 2 54 6

Trick 1: union by height  union shallow trees into deep trees Tree depth increases only when depths equal  Track path length to root 3 -3 1 1 -1 -1 0 1 2 3 4 5  Tree depth at most O(log 2 N) 1 3 0 4 2 5

Trick 1’: union by size  union small trees into big trees (Tree size always increases)  Track subtree size 3 -4 1 1 -1 -1 0 1 2 3 4 5  Tree depth at most ??? 1 3 0 4 2 5

Union by size  What does the worse-case tree look like?  What would the best-case tree look like?  What could we do to get closer to the best-case trees?

Trick 2: Path compression  find flattens trees Redirect nodes to point directly to the root  Example: find(0)  Do this whenever traversing a path from node to root. 1 3 0 4 2 5 1 0 4 2 5 3

Path compression  find flattens trees Redirect nodes to point directly to the root Do this whenever traversing a path from node to root. public int find(int x) { if (s[x]< 0) return x; return s[x] = find(s[x]); } This implies that union does path compression (through its calls to find)

The Code

All the code class UnionFind { int[] u; UnionFind(int n) { u = new int[n]; for (int i = 0; i < n; i++) u[i] = -1; } int find(int i) { int j,root; for (j = i; u[j] >= 0; j = u[j]) ; root = j; while (u[i] >= 0) { j = u[i]; u[i] = root; i = j; } return root; } void union(int i,int j) { i = find(i); j = find(j); if (i !=j) { if (u[i] < u[j]) { u[i] += u[j]; u[j] = i; } else { u[j] += u[i]; u[i] = j; } }

The UnionFind class class UnionFind { int[] u; UnionFind(int n) { u = new int[n]; for (int i = 0; i < n; i++) u[i] = -1; } int find(int i) {... } void union(int i,int j) {... } }

Trick 2: Iterative find int find(int i) { int j, root; for (j = i; u[j] >= 0; j = u[j]) ; root = j; while (u[i] >= 0) { j = u[i]; u[i] = root; i = j; } return root; }

Trick 1 ’ : union by size void union(int i,int j) { i = find(i); j = find(j); if (i != j) { if (u[i] < u[j]) { u[i] += u[j]; u[j] = i; } else { u[j] += u[i]; u[i] = j; } }

Time bounds  Variables  M operations.N elements.  Algorithms  Simple forest representation Worst: find O(N). mixed operations O(MN). Average: tricky  Union by height; Union by size Worst: find O(log N). mixed operations O(M log N). Average: mixed operations O(M) [see text]  Path compression in find Worst: mixed operations: “nearly linear” [analysis in 15-451]

15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 21 April 2005 Equivalence and Union-Find.

Similar presentations

Presentation on theme: "15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 21 April 2005 Equivalence and Union-Find."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 21 April 2005 Equivalence and Union-Find.

Similar presentations

Presentation on theme: "15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 21 April 2005 Equivalence and Union-Find."— Presentation transcript:

Similar presentations

About project

Feedback