CSE 221: Algorithms and Data Structures Lecture #9 Graphs (with no Axes to Grind) Steve Wolfman 2011W2.

CSE 221: Algorithms and Data Structures Lecture #9 Graphs (with no Axes to Grind)
Steve Wolfman 2011W2

Today’s Outline Topological Sort: Getting to Know Graphs with a Sort
Graph ADT and Graph Representations Graph Terminology (a lot of it!) More Graph Algorithms Shortest Path (Dijkstra’s Algorithm) Minimum Spanning Tree (Kruskal’s Algorithm) Alright, we’ll start with a little treat that has been a long time in the coming. Then, I’ll define what a graph is. We’ll get a taste of an algorithm before we actually hit all the terminology and data structures.

Total Order 1 2 3 4 5 6 7 means A must go before B A B
Now, most of the sorting you did involved a total ordering. Let’s draw an edge in this graph to show each element which needs to go before another element. Given this, what conditions do we have to satisfy to sort these vertices? 5 6 7 means A must go before B A B

Partial Order: Getting Dressed
pants shirt socks coat under roos Not all orderings are total orderings, however. Some of you have taken an interest in my dressing and disrobing process; so, I thought it would make a good example of a partial order. Now, everyone tell me where there should be an edge! shoes belt watch

Topological Sort Given a graph, G = (V, E), output all the vertices in V such that no vertex is output before any other vertex with an edge to it. A topological or topo-sort is just a valid sorting of these vertices when we define an edge as an ordering constraint. There are many other applications of topological sort. The classic is getting a valid ordering of classes when edges represent prerequisites Who knows what a catch-22 is? What happens if I add an edge from shoes to shirt?

Topo-Sort Take One Label each vertex’s in-degree (# of inbound edges)
While there are vertices remaining Pick a vertex with in-degree of zero and output it Reduce the in-degree of all vertices adjacent to it Remove it from the list of vertices That’s a bit less efficient than we can do. Let’s try this. How well does this run? runtime:

Topo-Sort Take Two Label each vertex’s in-degree
Initialize a queue to contain all in-degree zero vertices While there are vertices remaining in the queue Pick a vertex v with in-degree of zero and output it Reduce the in-degree of all vertices adjacent to v Put any of these with new in-degree zero on the queue Remove v from the queue runtime:

Graph ADT and Graph Representations Graph Terminology (a lot of it!) More Graph Algorithms Shortest Path (Dijkstra’s Algorithm) Minimum Spanning Tree (Kruskal’s Algorithm)

Graph ADT Graphs are a formalism useful for representing relationships between things a graph G is represented as G = (V, E) V is a set of vertices: {v1, v2, …, vn} E is a set of edges: {e1, e2, …, em} where each ei connects two vertices (vi1, vi2) operations might include: creation (with a certain number of vertices) inserting/removing edges iterating over vertices adjacent to a specific vertex asking whether an edge exists connecting two vertices Han Leia Luke V = {Han, Leia, Luke} E = {(Luke, Leia), (Han, Leia), (Leia, Han)} I’m not convinced this is really an ADT, but it is certainly an important structure.

Graph Applications Storing things that are graphs by nature Compilers
distance between cities airline flights, travel options relationships between people, things distances between rooms in Clue Compilers callgraph - which functions call which others dependence graphs - which variables are defined and used at which statements Others: mazes, circuits, class hierarchies, horses, networks of computers or highways or… Graphs are used all over the place.

Graph Representations
Han Leia Luke 2-D matrix of vertices (marking edges in the cells) “adjacency matrix” List of vertices each with a list of adjacent vertices “adjacency list” Alright, now you’re all motivated to use graphs. Let’s see how we can store them.

Adjacency Matrix A |V| x |V| array in which an element (u, v) is true if and only if there is an edge from u to v Han Luke Leia Han Han Leia Luke Luke iterate over vertices iterate over edges iterate over vertices adj. to a vertex check whether an edge exists Leia runtime for various operations? space requirements:

Adjacency List A |V|-ary list (array) in which each entry stores a list (linked list) of all adjacent vertices Han Han Leia Luke Luke Leia runtime for various operations? space requirements:

Graph ADT and Graph Representations Graph Terminology (a lot of it!) More Graph Algorithms Shortest Path (Dijkstra’s Algorithm) Minimum Spanning Tree (Kruskal’s Algorithm) Alright, we’ll start with a little treat that has been a long time in the coming. Then, I’ll define what a graph is. We’ll get a taste of an algorithm before we actually hit all the terminology and data structures. Finally, I’ll start out the shortest path problem. I’ll finish that on Friday.

Directed vs. Undirected Graphs
In directed graphs, edges have a specific direction: In undirected graphs, they don’t (edges are two-way): Vertices u and v are adjacent if (u, v)  E Han Luke Leia Han Luke Leia

Directed vs. Undirected Graphs
Adjacency lists and matrices both work fine to represent directed graphs. To represent undirected graphs, either ensure that both orderings of every edge are included in the representation or ensure that the order doesn’t matter (e.g., always use a “canonical” order), which works poorly in adjacency lists.

Weighted Graphs Each edge has an associated weight or cost. Clinton
20 Mukilteo Kingston 30 Edmonds Bainbridge 35 How could we store weights in a matrix? List? Seattle 60 Bremerton How can we store weights in an adjacency matrix? In an adjacency list?

Graph Density A sparse graph has O(|V|) edges
A dense graph has (|V|2) edges Anything in between is either on the sparse side or on the dense side, depending critically on context! Graph density is a measure of the number of edges. Remember that we need to analyze graphs in terms of both |E| and |V|. If we know the density of the graph, we can dispense with that.

Graph Density A sparse graph has O(|V|) edges
Why is the adjacency list likely to be a better representation than the adjacency matrix for sparse graphs? Sparse graphs have too few edges to fit in an adjacency matrix. Much of the matrix will be “wasted” on 0s. The adjacency list will guarantee better performance on a sparse graph. None of these. Graph density is a measure of the number of edges. Remember that we need to analyze graphs in terms of both |E| and |V|. If we know the density of the graph, we can dispense with that.

Connectivity Undirected graphs are connected if there is a path between any two vertices Directed graphs are strongly connected if there is a path from any one vertex to any other Di-graphs are weakly connected if there is a path between any two vertices, ignoring direction A complete graph has an edge between every pair of vertices Can a directed graph which is strongly connected be acyclic? NO. (Except the trivial one node case.) There must be a path from A to B and a path from B to A; so, there must be a cycle. What about a weakly connected directed graph? YES! See the one on the slide. There are also further definitions of these; for example, the concept of biconnectivity showed up in my satisfiability research: Does the graph have two distinct paths between any two vertices.

Isomorphism and Subgraphs
We often care only about the structure of a graph, not the names of its vertices. Then, we can ask: “Are two graphs isomorphic?” Do the graphs have identical structure? Can you “line up” their vertices so that their edges match? “Is one graph a subgraph of the other?” Is one graph isomorphic to a part of the other graph (a subset of its vertices and a subset of the edges connecting those vertices)? Can a directed graph which is strongly connected be acyclic? NO. (Except the trivial one node case.) There must be a path from A to B and a path from B to A; so, there must be a cycle. What about a weakly connected directed graph? YES! See the one on the slide. There are also further definitions of these; for example, the concept of biconnectivity showed up in my satisfiability research: Does the graph have two distinct paths between any two vertices.

Degree The degree of a vertex v∈V is denoted deg(v) and represents the number of edges incident on v. (An edge from v to itself contributes 2 towards the degree.) Handshaking Theorem: If G=(V,E) is an undirected graph, then: Corollary: An undirected graph has an even number of vertices of odd degree.

Degree/Handshake Example
The degree of a vertex v∈V is the number of edges incident on v. Let’s label the degree of every node and calculate the sum…

Degree for Directed Graphs
The in-degree of a vertex v∈V is denoted deg-(v) is the number of edges coming in to v. The out-degree of a vertex v∈V is denoted deg+(v) is the number of edges coming out of v. We let deg(v) = deg+(v) + deg-(v) Then:

Trees as Graphs Every tree is a graph with some restrictions:
the tree is directed there are no cycles (directed or undirected) there is a directed path from the root to every node A B C D E F Each of the red areas breaks one of these constraints. We won’t always require the constraint that the tree be directed. In fact, if it’s not directed, we can just pick a root and hang everything from that node (making the edges directed); so, it’s not a big deal. G H I J

Directed Acyclic Graphs (DAGs)
DAGs are directed graphs with no cycles. main() mult() add() DAGs are a very common representation for dependence graphs. The DAG here shows the non-recursive call-graph from a program. Remember topological sort? Any DAG can be topo-sorted. Any graph that’s not a DAG cannot be topo-sorted. read() access() Trees  DAGs  Graphs We can only topo-sort DAGs!

Single Source, Shortest Path
Given a graph G = (V, E) and a vertex s  V, find the shortest path from s to every vertex in V Many variations: weighted vs. unweighted cyclic vs. acyclic positive weights only vs. negative weights allowed multiple weight types to optimize The problem that Dijkstra’s algorithm addresses is the single source, shortest path problem. Given a graph and a source vertex, find the shortest path from the source to every vertex. We can put a variety of limitations or spins on the problem. We’ll focus on weighted graphs with no negative weights. This is used in all sorts of optimization problems: minimum delay in a network, minimum cost flights for airplane routes, etc.

The Trouble with Negative Weighted Cycles
2 A B 10 -5 1 E 2 You might wonder why we don’t allow negative weights. Here’s one reason, we’ll see another later. The shortest path here is undefined! We can always go once more around the cycle and get a lower cost path. C D What’s the shortest path from A to E? (or to B, C, or D, for that matter)

Unweighted Shortest Path Problem
Assume source vertex is C… B A F H G C D E Now, can we handle unweighted graphs? How might we implement this? Breadth first search! Will it work for weighted graphs? No. It won’t. What about Depth First search? Clearly not that either. What about the third search we studied? Best first search? That won’t quite work, but it might if we counted the path cost in with the edge cost. Let’s come back to that. Distance to: A B C D E F G H

Dijkstra Legendary figure in computer science; was a professor at University of Texas. Supported teaching introductory computer courses without computers (pencil and paper programming) Supposedly wouldn’t (until late in life) read his ; so, his staff had to print out messages and put them in his box. To move to weighted graphs, we appeal to the mighty power of Dijkstra. This is one of those names in computer science you should just know! Like Turing or Knuth. So, here’s the super-brief bio of Dijkstra. Look him up if you’re interested in more.

Dijkstra’s Algorithm for Single Source Shortest Path
Classic algorithm for solving shortest path in weighted graphs without negative weights A greedy algorithm (irrevocably makes decisions without considering future consequences) Intuition: shortest path from source vertex to itself is 0 cost of going to adjacent nodes is at most edge weights cheapest of these must be shortest path to that node update paths for new node and continue picking cheapest path Among the wide variety of things he has done, he created Dijkstra’s algorithm more than 30 years ago. Dijkstra’s algorithm is a greedy algorithm (like Huffman encoding), so it just makes the best local choice at each step. The choice it makes is which shortest path to declare known next. It starts by declaring the start node known to have a shortest path of length 0. Then, it updates neighboring node’s path costs according to the start node’s cost. Then, it just keeps picking the next shortest path and fixing that one until it has all the vertices.

Intuition in Action A C B D F H G E 2 3 1 10 9 4 8 7
So, if C is the start node, we’ll start by fixing C’s path cost at 0. Then, E is next at 8. G and A are tied at 9 next. We’ll arbitrarily choose A. Then G at 9. Then, B at 11. And so forth.

Dijkstra’s Pseudocode (actually, our pseudocode for Dijkstra’s algorithm)
Initialize the cost of each node to  Initialize the cost of the source to 0 While there are unknown nodes left in the graph Select the unknown node with the lowest cost: n Mark n as known For each node a which is adjacent to n a’s cost = min(a’s old cost, n’s cost + cost of (n, a)) Here’s pseudocode for how dijkstra’s actually works. Speak the algorithm. Notice that we need to be able to find a minimum node and also update costs to nodes. We can get the path from this just as we did for mazes!

Dijkstra’s Algorithm in Action
B D F H G E 2 3 1 4 10 8 9 7 OK, let’s do this a bit more carefully.

The Cloud Proof THE KNOWN CLOUD
Next shortest path from inside the known cloud G Better path to the same node THE KNOWN CLOUD P Source Does Dijkstra’s work? For that, let’s look to the cloud proof. Everything inside the cloud is “known” and we have the shortest path to those nodes. Next, we’re going to say the path to G is its shortest path. But, what if there’s some better path that we would have come to later? Well, the path to G is at least as short as the path to P. So, how could the path through P end up better than the path direct to G? But, if the path to G is the next shortest path, the path to P must be at least as long. So, how can the path through P to G be shorter?

Inside the Cloud (Proof)
Everything inside the cloud has the correct shortest path Proof is by induction on the # of nodes in the cloud: initial cloud is just the source with shortest path 0 inductive step: once we prove the shortest path to G is correct, we add it to the cloud To polish this off, note that it’s an inductive proof. Here’s the other problem with negative weights. A negative weight could make it actually better to wander outside the cloud for a while instead of going straight to a new unknown node. Negative weights blow this proof away!

Data Structures for Dijkstra’s Algorithm
|V| times: Select the unknown node with the lowest cost findMin/deleteMin |E| times: a’s cost = min(a’s old cost, …) What data structures do we use to support these little snippets from Dijkstra’s? Priority Queue and a (VERY SIMPLE) dictionary. Initialization just takes O(|V|), so all the runtime is in these pq operations. O(E log V + V log V)… but, if we assume the graph is connected, this becomes: O(E log V) (Dijkstra’s will work fine with a disconnected graph, however.) decreaseKey (i.e., change a key and fix the heap) find by name (dictionary lookup!) runtime:

Fibonacci Heaps Very cool variation on Priority Queues
Amortized O(1) time bound for decreaseKey O(log n) time for deleteMin Dijkstra’s uses |V| deleteMins and |E| decreaseKeys So let’s compare the two. Binary heaps give us O(E log V) Fibonacci gives us O(E log V) So, if E = omega(V), this is asymptotically better. But, Fibonacci heaps have a good deal of overhead and are pointer-based. Therefore, binary heaps may be the right way to go. P.S. In dense graphs, unsorted linked-list queue is as fast as fibonacci heaps! runtime with Fibonacci heaps:

What’s a Good Maze? A bunch of adjacent rooms.
Just enough walls so that there’s one path from any one room to any other room especially from the start to the finish at least a bit random so it’s hard to figure out.

The Maze Construction Problem
Given: collection of rooms: V connections between rooms (initially all closed): E Construct a maze: collection of rooms: V = V designated rooms in, iV, and out, oV collection of connections to knock down: E  E such that one unique path connects every two rooms Here’s a formalism which tries to approach that.

The Middle of the Maze So far, a number of walls have been knocked down while others remain. Now, we consider the wall between A and B. Should we knock it down? if A and B are otherwise connected if A and B are not otherwise connected A B Let’s say we’re in the middle of knocking out walls to make a good maze. How do we decide whether to knock down the current wall?

Maze Construction Algorithm
While edges remain in E Remove a random edge e = (u, v) from E If u and v have not yet been connected add e to E mark u and v as connected

Spanning Tree Spanning tree: a subset of the edges from a connected graph that… touches all vertices in the graph (spans the graph) forms a tree (is connected and contains no cycles) Minimum spanning tree: the spanning tree with the least total edge cost. 4 7 9 Interesting note before we move on: it turns out that Dijkstra’s is as good as the best algorithm we have for single-source, single-destination shortest path! OK, that’s Dijkstra’s. Now, let’s get a quick definition before we move on to Kruskal’s algorithm. Which of these three is the minimum spanning tree? (The one in the middle) 2 1 5

Kruskal’s Algorithm for Minimum Spanning Trees
Yet another greedy algorithm: Initialize all vertices to unconnected While there are still unmarked edges Pick the lowest cost edge e = (u, v) and mark it If u and v are not already connected, add e to the minimum spanning tree and connect u and v Here’s how Kruskal’s works. This should look very familiar. Remember our algorithm for maze creation? Except that the edge order there was random, this is the same as that algorithm! Sound familiar?

Kruskal’s Algorithm in Action (1/5)
2 2 3 B A F H 1 2 1 4 9 10 G C 4 2 8 D Therefore, this example should look a lot like maze creation. We test the cheapest edge first. Since B and C are unconnected, we add the edge to the MST and union B and C. E 7

3 B A F H 1 2 1 4 9 10 G C 4 2 8 D I’ll do some uninteresting edges right away (all those edges until the 3 succeed). Now, we test 3. Are F and H connected? Yes, so we throw out the 3 edge. E 7

2 2 3 B A F H 1 2 1 4 9 10 G C 4 2 8 D What about A and D? Connected: so throw out the 4 edge. E 7

2 2 3 B A F H 1 2 1 4 9 10 G C 4 2 8 D Finally, G and E are unconnected, so we union them. But, that connects the last unconnected vertices in the graph. Therefore, we’re done. E 7

Kruskal’s Algorithm Completed (5/5)
2 2 3 B A F H 1 2 1 4 9 10 G C 4 2 8 D Here’s the minimum spanning tree. Notice that just like with mazes: the tree has no cycles it touches all the vertices there’s one unique path between any two vertices. E 7

Does the algorithm work?
Warning! Proof in Epp (3rd p. 728) is slightly wrong. Wikipedia has a good proof. That’s basis of what I’ll present. It actually comes out naturally from how we’ve taught you to try to prove a program correct. TODO (Steve, 2011/07/07): Patch up the presentation of this proof; examples should be running examples interspersed throughout, not afterthoughts at end. Otherwise, Alan’s treatment is fabulous! (Oh, and right .) May also want to use incorrect proofs as an exercise/alter treatment (only briefly skimmed since Alan sent me original mail about incorrectness).

Kruskal’s Algorithm: Does this work?
Initialize all vertices to their own sets (i.e. unconnected) Initialize all edges as unmarked. While there are still unmarked edges Pick the lowest cost unmarked edge e = (u, v) and mark it. If u and v are in different sets, add e to the minimum spanning tree and union the sets for u and v Do a loop invariant! One student suggested a loop invariant that the edges so far are each MSTs for the vertices they connect. I think this should work, but I haven’t done the proof. How have we learned to try to prove something like this? 53

Kruskal’s Algorithm: What’s a good loop invariant???
Initialize all vertices to their own sets (i.e. unconnected) Initialize all edges as unmarked. While there are still unmarked edges Pick the lowest cost unmarked edge e = (u, v) and mark it. If u and v are in different sets, add e to the minimum spanning tree and union the sets for u and v 54

Loop Invariant for Kruskal’s
(There are lots of technical, detailed loop invariants that would be needed for a totally formal proof, e.g.:) Each set is spanned by edges added to MST you are building. Those edges form a tree. … We will assume most of these without proof, if they are pretty obvious.

What do we know about the partial solution we’re building up at each iteration?

Kruskal’s Algorithm in Action (1.5/5)
2 2 3 B A F H 1 2 1 4 9 10 G C 4 2 8 D The edges are a forest. Haven’t screwed up yet (can extend into an MST). E 7 57

What do we know about the partial solution we’re building up at each iteration? Since we’re being greedy, we never go back and erase edges we add. Therefore, for the algorithm to work, whatever we’ve got so far must be part of some minimum spanning tree. That’s the key to making the proof work!

Loop Invariant Proof for Kruskal’s
Candidate Loop Invariant: Whatever edges we’ve added at the start of each iteration are part of some minimum spanning tree.

Candidate Loop Invariant: Whatever edges we’ve added at the start of each iteration are part of some minimum spanning tree. Base Case: Inductive Step:

Candidate Loop Invariant: Whatever edges we’ve added at the start of each iteration are part of some minimum spanning tree. Base Case: When first arrive at the loop, the set of edges we’ve added is empty, so it’s vacuously true. (We can’t have made any mistakes yet, since we haven’t picked any edges yet!) Inductive Step:

Candidate Loop Invariant: Whatever edges we’ve added at the start of each iteration are part of some minimum spanning tree. Base Case: Done! Inductive Step: Assume that the loop invariant holds at start of loop body. Want to prove that it holds the next time you get to start of loop body (which is also the “bottom of the loop”).

Loop Invariant Proof for Kruskal’s Inductive Step
Candidate Loop Invariant: Whatever edges we’ve added at the start of each iteration are part of some minimum spanning tree. Inductive Step: Assume that the loop invariant holds at start of loop body. Let F be the set of edges we’ve added so far. Loop body has an if statement. Therefore, two cases!

Kruskal’s Algorithm: Initialize all vertices to their own sets (i.e. unconnected) Initialize all edges as unmarked. While there are still unmarked edges Pick the lowest cost unmarked edge e = (u, v) and mark it. If u and v are in different sets, add e to the minimum spanning tree and union the sets for u and v Here’s how Kruskal’s works. This should look very familiar. Remember our algorithm for maze creation? Except that the edge order there was random, this is the same as that algorithm! 64

Candidate Loop Invariant: Whatever edges we’ve added at the start of each iteration are part of some minimum spanning tree. Inductive Step: Assume that the loop invariant holds at start of loop body. Let F be the set of edges we’ve added so far. Loop body has an if statement. Therefore, two cases! Case I: u and v are already in same set. Therefore, the edge is not needed in any spanning tree that includes the edges we have so far. Therefore, we throw out the edge, leave F unchanged, and loop invariant still holds.

Candidate Loop Invariant: Whatever edges we’ve added at the start of each iteration are part of some minimum spanning tree. Inductive Step: Assume that the loop invariant holds at start of loop body. Let F be the set of edges we’ve added so far. Loop body has an if statement. Therefore, two cases! Case I: Done! Case II: u and v are in different sets. We add the edge to F and merge the sets for u and v. This is the tricky case!

Loop Invariant Proof for Kruskal’s Inductive Step: Case II
Assume that the loop invariant holds at start of loop body. Let F be the set of edges we’ve added so far. Because loop invariant holds, there exists some MST T that includes all of F. The algorithm will pick a new edge e to add to F. Two Sub-Cases (of Case II)! If e is in T, we add e to F and loop invariant still holds. If e is not in T,… This is tricky. We build a different MST from T that includes all of F+e …

Loop Invariant Proof for Kruskal’s Inductive Step: Case II-b
Two Sub-Cases (of Case II)! If e is in T, we add e to F and loop invariant still holds. If e is not in T,… This is tricky. We build a different MST from T that includes all of F+e … If we add e to T, then T+e must have a unique cycle C. C must have a different edge f not in F. (Otherwise, adding e would have made a cycle in F.) Therefore, T+e-f is also a spanning tree. If w(f)<w(e), then Kruskal’s would have picked f next, not e. Therefore, w(T+e-f) = W(T). Therefore, T+e-f is an MST that includes all of F+e Loop invariant still holds.

Previous Example (Slightly Modified) to Show Proof Step
2 2 3 B A F H 2 2 1 2 9 10 G C 4 2 8 D Therefore, this example should look a lot like maze creation. We test the cheapest edge first. Since B and C are unconnected, we add the edge to the MST and union B and C. E 7 Before loop, F is the green edges. 69

2 2 3 B A F H 2 2 1 2 9 10 G C 4 2 8 D Therefore, this example should look a lot like maze creation. We test the cheapest edge first. Since B and C are unconnected, we add the edge to the MST and union B and C. E 7 There exists an MST T that extends F (e.g., the fat edges) 70

2 2 3 B A F H 2 2 1 2 9 10 G C 4 2 8 D Therefore, this example should look a lot like maze creation. We test the cheapest edge first. Since B and C are unconnected, we add the edge to the MST and union B and C. E 7 What if we pick e (red edge) that is not part of T? Then T+e has a cycle… 71

2 2 3 B A F H 2 2 1 2 9 10 G C 4 2 8 D Therefore, this example should look a lot like maze creation. We test the cheapest edge first. Since B and C are unconnected, we add the edge to the MST and union B and C. E 7 What if we pick e (red edge) that is not part of T? Then T+e has a cycle, and the cycle includes an edge f not in F (blue edge). 72

2 2 3 B A F H 2 2 1 2 9 10 G C 4 2 8 D Therefore, this example should look a lot like maze creation. We test the cheapest edge first. Since B and C are unconnected, we add the edge to the MST and union B and C. E 7 w(e) must be less than or equal to w(f) Therefore, T+e-F is also an MST, but it includes all of F+e. 73

Data Structures for Kruskal’s Algorithm
|E| times: Pick the lowest cost edge… findMin/deleteMin |E| times: If u and v are not already connected… …connect u and v. What data structures do we need for this? Priority Queue and Disjoint Set Union/Find What about initialization? We need to put all the edges in a priority queue; we can do that fast with a buildHeap call: O(|E|) So, overall, the runtime is O(|E|ack(|E|,|V|) + |E|log|E|) Who knows what’s slower? Inverse ackermann’s or log? LOG! How much slower? TONS! One last little point: |E| is at most |V|2, but log|V|2 = 2 log |V|. So: O(|E|log|V|) find representative union With “disjoint-set” data structure, |E|lg(|E|) runtime.

Learning Goals After this unit, you should be able to:
Describe the properties and possible applications of various kinds of graphs (e.g., simple, complete), and the relationships among vertices, edges, and degrees. Prove basic theorems about simple graphs (e.g. handshaking theorem). Convert between adjacency matrices/lists and their corresponding graphs. Determine whether two graphs are isomorphic. Determine whether a given graph is a subgraph of another. Perform breadth-first and depth-first searches in graphs. Explain why graph traversals are more complicated than tree traversals.

To Do Read Parallelism and Concurrency notes
Read Epp 11.1, and KW

Coming Up Parallelism, Concurrency, and then Counting

Wrong Proofs Skip these if you find them confusing. (Continue with efficiency.) It’s hard to give a “counterexample”, since the algorithm is correct. I will try to show why certain steps in the proof aren’t guaranteed to work as claimed. What goes wrong is that the proofs start from the finished result of Kruskal’s, so it’s hard to specify correctly which edge needs to get swapped.

Old (Wrong) Proof of Correctness
We already know this finds a spanning tree. Proof by contradiction that Kruskal’s finds the minimum: Assume another spanning tree has lower cost than Kruskal’s Pick an edge e1 = (u, v) in that tree that’s not in Kruskal’s Kruskal’s tree connects u’s and v’s sets with another edge e2 But, e2 must have at most the same cost as e1 (or Kruskal’s would have found and used e1 first to connect u’s and v’s sets) So, swap e2 for e1 (at worst keeping the cost the same) Repeat until the tree is identical to Kruskal’s: contradiction! QED: Kruskal’s algorithm finds a minimum spanning tree. We already know this makes a spanning tree because our maze generation algorithm made a spanning tree. But, does this find the minimum spanning tree? Let’s assume it doesn’t. Then, there’s some other better spanning tree. Let’s try and make that tree more like Kruskal’s tree. (otherwise, Kruskal’s would have considered and chosen e1 before ever reaching e2) 79

Counterexample Graph Assume the graph is shaped like this.
Ignore the details of edge weights. (E.g., they might all be equal or something.)

Counterexample Old Proof
v u v Kruskal’s Result Other MST The proof assumes some other MST and picks an edge e1 connecting vertices u and v that’s not in Kruskal’s result.

v u v Kruskal’s Result Other MST In Kruskal’s result, the sets for u and v were connected at some point by some edge e2. Let’s suppose it was the edge shown (since we don’t know when those components were connected). w(e2)<=w(e1) or else Kruskal’s would have picked e1.

v u v Kruskal’s Result Other MST The old wrong proof then says to swap e2 for e1 in the other MST. But we can’t do it, because e2 is already in the other MST! So, the proof is wrong, as it is relying on an illegal step.

Fixing Old Proof e2 e3 e1 Kruskal’s Result Other MST
v u v Kruskal’s Result Other MST To fix the proof, note that adding e1 to Kruskal’s creates a cycle. Some other edge e3 on that cycle must be in Kruskal’s but not the other MST (otherwise, other MST would have had a cycle).

v u v Kruskal’s Result Other MST We already know w(e2)<=w(e1), or Kruskal would have had e1. Now, note that e2 was the edge that merged u and v’s sets. Therefore, w(e3)<=w(e2), because Kruskal added it earlier. So, w(e3)<=w(e2)<=w(e1).

v u v Kruskal’s Result Other MST So, w(e3)<=w(e2)<=w(e1). Therefore, we can swap e3 for e1 in the other MST, making it one edge closer to Kruskal’s, and continue with the old proof. 

Counterexample for Epp’s Proof
Assume the graph is shaped like this. In this case, I’ve got an actual counterexample, with specific weights. Assume all edges have weight 1, except for the marked edges with weight 2. 2 2 2 2 2 2

Counterexample Epp’s Proof
2 2 2 2 2 2 Other MST T1 Kruskal’s Result T Epp’s proof (3rd edition, pp ) also starts with Kruskal’s result (she calls it T) and some other MST, which she calls T1. She tries to show that for any other T1, you can convert it into T by a sequence of swaps that doesn’t change the weight.

2 e 2 2 2 2 2 Other MST T1 Kruskal’s Result T If T is not equal to T1, there exists an edge e in T, but not in T1. This could be the edge shown.

2 e 2 e e’ 2 2 2 2 2 Other MST T1 Kruskal’s Result T Adding e to T1 produces a unique “circuit”. She then says, “Let e’ be an edge of this circuit such that e’ is not in T.” OK, so this could be the labeled edge e’ of the cycle that is not in T.

2 e 2 e 2 2 2 2 2 Other MST T2 Kruskal’s Result T Next, she creates T2 by adding e to T1 and deleting e’. I am showing T2 above. Note, however, that we’ve added an edge with weight 2 and deleted an edge with weight 1! T2 has higher weight (12) than T1 did (11). The proof is wrong!

2 e 2 e e’ 2 2 2 2 2 Other MST T2 Kruskal’s Result T It’s interesting to read the wrong justification given in the proof that w(e)=2 has to be less than w(e’)=1. “…at the stage in Kruskal’s algorithm when e was added to T, e’ was available to be added [since … at that stage its addition could not produce a circuit…]” Oops!

2 e 2 e e’ 2 2 2 2 2 Other MST T2 Kruskal’s Result T I don’t see an easy fix for her proof. It might be possible to show that there must be a suitable edge with sufficiently large weight. The hard part is that you have to reason back to how Kruskal’s algorithm could have done something, after the fact!

2 e 2 e e’ 2 2 2 2 2 Other MST T2 Kruskal’s Result T See how much easier it was to do the proof with loop invariants?! You prove what you need at exactly the point in the algorithm when you are making decisions, so you know exactly what edge e gets added and what edge f gets deleted.

Some Extra Examples, etc.

Bigger (Undirected) Formal Graph Example
G = <V,E> V = vertices = {A,B,C,D,E,F,G,H,I,J,K,L} E = edges = {(A,B),(B,C),(C,D),(D,E),(E,F), (F,G),(G,H),(H,A),(A,J),(A,G), (B,J),(K,F),(C,L),(C,I),(D,I), (D,F),(F,I),(G,K),(J,L),(J,K), (K,L),(L,I)} (A simple graph like this one is undirected, has 0 or 1 edge between each pair of vertices, and no edge from a vertex to itself.)

An edge with endpoints B and C
A vertex A cycle: A to B to J to A A path: A to B to C to D A path is a list of vertices {v1, v2, …, vn} such that (vi, vi+1)  E for all 0  i < n. A cycle is a path that starts and ends at the same vertex.

Example V = {A, B, C, D, E} E = {{A, B}, {A, D}, {C, E}, {D, E}} 98

A Directed Graph V = {A, B, C, D, E}
E = {(A, B), (B, A), (B, E), (D, A), (E, A), (E, D), (E, C)} 99

Weighted Graph 100

Example of a Path 101

Example of a Cycle 102

Disconnected Graph 103

Graph Isomorphism The numbering of the vertices, and their physical arrangement are not important. The following is the same graph as the previous slide. Buttons and string 104

Adjacency Matrix 105

Adjacency Matrix (directed graph)
106

Adjacency List (Directed Graph)
107

Adjacency List Representation
108

Breadth First Search Starting at a source vertex
Systematically explore the edges to “discover” every vertex reachable from s. Produces a “breadth-first tree” Root of s Contains all vertices reachable from s Path from s to v is the shortest path 109

Algorithm Take a start vertex, mark it identified (color it gray), and place it into a queue. While the queue is not empty Take a vertex, u, out of the queue (Begin visiting u) For all vertices v, adjacent to u, If v has not been identified or visited Mark it identified (color it gray) Place it into the queue Add edge u, v to the Breadth First Search Tree We are now done visiting u (color it black) 110

Example 111

Trace of Breadth First Search
113

Breadth-First Search Tree
114

Application of BFS A Breadth First Search finds the shortest path from the start vertex to all other vertices based on number of edges. We can use a Breadth First Search to find the shortest path through a maze. This may be a more efficient solution than that found by a backtracking algorithm. 115

Depth First Search Start at some vertex
A simple path is a path with no cycles. Start at some vertex Follow a simple path discovering new vertices until you cannot find a new vertex. Back-up until you can start finding new vertices. 116

Algorithm for DFS Start at vertex u. Mark it visited.
For each vertex v adjacent to u If v has not been visited Insert u, v into DFS tree Recursively apply this algorithm starting at v Mark u finished. Note: for a Directed graph, a DFS may produce multiple trees – this is called a DFS forest. 117

DFS Example 118

Trace of Depth-First Search
120

Depth-First Search Tree
121

Application of DFS A topological sort is an ordering of the vertices such that if (u, v) is an edge in the graph, then v does not appear before u in the ordering. A Depth First Search can be used to determine a topological sort of a Directed Acyclic Graph (DAG) An application of this would be to determine an ordering of courses that is consistent with the prerequisite requirements. 122

CIS Prerequisites 123

Topological Sort Algorithm
Observation: If there is an edge from u to v, then in a DFS u will have a finish time later than v. Perform a DFS and output the vertices in the reverse order that they’re marked as finished. 124

Example of a Directed Weighted Graph
125

Another Weighted Graph
126

Example 127

Separate Tree and Graph Examples: Cities in Germany

CSE 221: Algorithms and Data Structures Lecture #9 Graphs (with no Axes to Grind) Steve Wolfman 2011W2.

Similar presentations

Presentation on theme: "CSE 221: Algorithms and Data Structures Lecture #9 Graphs (with no Axes to Grind) Steve Wolfman 2011W2."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CSE 221: Algorithms and Data Structures Lecture #9 Graphs (with no Axes to Grind) Steve Wolfman 2011W2.

Similar presentations

Presentation on theme: "CSE 221: Algorithms and Data Structures Lecture #9 Graphs (with no Axes to Grind) Steve Wolfman 2011W2."— Presentation transcript:

Similar presentations

About project

Feedback