CSE 326: Data Structures Part 8 Graphs

CSE 326: Data Structures Part 8 Graphs
Henry Kautz Autumn Quarter 2002

Outline Graphs (TO DO: READ WEISS CH 9) Graph Data Structures
Graph Properties Topological Sort Graph Traversals Depth First Search Breadth First Search Iterative Deepening Depth First Shortest Path Problem Dijkstra’s Algorithm Alright, we’ll start with a little treat that has been a long time in the coming. Then, I’ll define what a graph is. We’ll get a taste of an algorithm before we actually hit all the terminology and data structures. Finally, I’ll start out the shortest path problem. I’ll finish that on Friday.

Graph ADT Graphs are a formalism for representing relationships between objects a graph G is represented as G = (V, E) V is a set of vertices E is a set of edges operations include: iterating over vertices iterating over edges iterating over vertices adjacent to a specific vertex asking whether an edge exists connected two vertices Han Leia Luke V = {Han, Leia, Luke} E = {(Luke, Leia), (Han, Leia), (Leia, Han)} I’m not convinced this is really an ADT, but it is certainly an important structure.

What Graph is THIS?

ReferralWeb (co-authorship in scientific papers)

Biological Function Semantic Network

Graph Representation 1: Adjacency Matrix
A |V| x |V| array in which an element (u, v) is true if and only if there is an edge from u to v Han Luke Leia Han Han Leia Luke Luke iterate over vertices iterate over edges iterate over vertices adj. to a vertex check whether an edge exists Runtime: iterate over vertices iterate ever edges iterate edges adj. to vertex edge exists? Leia Space requirements:

Graph Representation 2: Adjacency List
A |V|-ary list (array) in which each entry stores a list (linked list) of all adjacent vertices Han Han Leia Luke Luke Runtime: iterate over vertices iterate ever edges iterate edges adj. to vertex edge exists? Leia space requirements:

Directed vs. Undirected Graphs
In directed graphs, edges have a specific direction: In undirected graphs, they don’t (edges are two-way): Vertices u and v are adjacent if (u, v)  E Han Luke Leia Han Luke Leia

Graph Density A sparse graph has O(|V|) edges
A dense graph has (|V|2) edges Anything in between is either sparsish or densy depending on the context. Graph density is a measure of the number of edges. Remember that we need to analyze graphs in terms of both |E| and |V|. If we know the density of the graph, we can dispense with that.

Weighted Graphs Each edge has an associated weight or cost. Clinton
20 Mukilteo Kingston 30 Edmonds How could we store weights in a matrix? List? Bainbridge 35 Seattle 60 Bremerton There may be more information in the graph as well.

Paths and Cycles A path is a list of vertices {v1, v2, …, vn} such that (vi, vi+1)  E for all 0  i < n. A cycle is a path that begins and ends at the same node. Chicago Seattle Salt Lake City San Francisco Dallas p = {Seattle, Salt Lake City, Chicago, Dallas, San Francisco, Seattle}

Path Length and Cost Path length: the number of edges in the path
Path cost: the sum of the costs of each edge 3.5 Chicago Seattle 2 2 Salt Lake City 2 2.5 2.5 2.5 3 San Francisco Dallas length(p) = 5 cost(p) = 11.5

Connectivity Undirected graphs are connected if there is a path between any two vertices Directed graphs are strongly connected if there is a path from any one vertex to any other Directed graphs are weakly connected if there is a path between any two vertices, ignoring direction A complete graph has an edge between every pair of vertices Can a directed graph which is strongly connected be acyclic? NO. (Except the trivial one node case.) There must be a path from A to B and a path from B to A; so, there must be a cycle. What about a weakly connected directed graph? YES! See the one on the slide. There are also further definitions of these; for example, the concept of biconnectivity showed up in my satisfiability research: Does the graph have two distinct paths between any two vertices.

Trees as Graphs Every tree is a graph with some restrictions:
the tree is directed there are no cycles (directed or undirected) there is a directed path from the root to every node A B C D E F Each of the red areas breaks one of these constraints. We won’t always require the constraint that the tree be directed. In fact, if it’s not directed, we can just pick a root and hang everything from that node (making the edges directed); so, it’s not a big deal. G H BAD! I J

Directed Acyclic Graphs (DAGs)
DAGs are directed graphs with no cycles. main() mult() if program call graph is a DAG, then all procedure calls can be in-lined add() DAGs are a very common representation for dependence graphs. The DAG here shows the non-recursive call-graph from a program. Remember topological sort? Any DAG can be topo-sorted. Any graph that’s not a DAG cannot be topo-sorted. read() access() Trees  DAGs  Graphs

Application of DAGs: Representing Partial Orders
reserve flight check in airport call taxi pack bags take flight Not all orderings are total orderings, however. Now, everyone tell me where there should be an edge! locate gate taxi to airport

Topological Sort Given a graph, G = (V, E), output all the vertices in V such that no vertex is output before any other vertex with an edge to it. reserve flight check in airport call taxi take flight taxi to airport locate gate pack bags

Topo-Sort Take One Label each vertex’s in-degree (# of inbound edges)
While there are vertices remaining Pick a vertex with in-degree of zero and output it Reduce the in-degree of all vertices adjacent to it Remove it from the list of vertices That’s a bit less efficient than we can do. Let’s try this. How well does this run? runtime:

Topo-Sort Take Two Label each vertex’s in-degree
Initialize a queue (or stack) to contain all in-degree zero vertices While there are vertices remaining in the queue Remove a vertex v with in-degree of zero and output it Reduce the in-degree of all vertices adjacent to v Put any of these with new in-degree zero on the queue Why use a queue? runtime:

Recall: Tree Traversals
b c d e h i j f g k l a b f g k c d h i l j e

Depth-First Search Pre/Post/In – order traversals are examples of depth-first search Nodes are visited deeply on the left-most branches before any nodes are visited on the right-most branches Visiting the right branches deeply before the left would still be depth-first! Crucial idea is “go deep first!” Difference in pre/post/in-order is how some computation (e.g. printing) is done at current node relative to the recursive calls In DFS the nodes “being worked on” are kept on a stack

Iterative Version DFS Pre-order Traversal
Push root on a Stack Repeat until Stack is empty: Pop a node Process it Push it’s children on the Stack

Level-Order Tree Traversal
Consider task of traversing tree level by level from top to bottom (alphabetic order) Is this also DFS? a i d h j b f k l e c g

Breadth-First Search No! Level-order traversal is an example of Breadth-First Search BFS characteristics Nodes being worked on maintained in a FIFO Queue, not a stack Iterative style procedures often easier to design than recursive procedures Put root in a Queue Repeat until Queue is empty: Dequeue a node Process it Add it’s children to queue

QUEUE a i d h j b f k l e c g a b c d e c d e f g d e f g e f g h i j
h i j k i j k j k l k l l a i d h j b f k l e c g

Graph Traversals Depth first search and breadth first search also work for arbitrary (directed or undirected) graphs Must mark visited vertices so you do not go into an infinite loop! Either can be used to determine connectivity: Is there a path between two given vertices? Is the graph (weakly) connected? Important difference: Breadth-first search always finds a shortest path from the start vertex to any other (for unweighted graphs) Depth first search may not!

Demos on Web Page DFS BFS

Is BFS the Hands Down Winner?
Depth-first search Simple to implement (implicit or explict stack) Does not always find shortest paths Must be careful to “mark” visited vertices, or you could go into an infinite loop if there is a cycle Breadth-first search Simple to implement (queue) Always finds shortest paths Marking visited nodes can improve efficiency, but even without doing so search is guaranteed to terminate

Space Requirements Consider space required by the stack or queue…
Suppose G is known to be at distance d from S Each vertex n has k out-edges There are no (undirected or directed) cycles BFS queue will grow to size kd Will simultaneously contain all nodes that are at distance d (once last vertex at distance d-1 is expanded) For k=10, d=15, size is 1,000,000,000,000,000

DFS Space Requirements
Consider DFS, where we limit the depth of the search to d Force a backtrack at d+1 When visiting a node n at depth d, stack will contain (at most) k-1 siblings of n parent of n siblings of parent of n grandparent of n siblings of grandparent of n … DFS queue grows at most to size dk For k=10, d=15, size is 150 Compare with BFS 1,000,000,000,000,000

Conclusion For very large graphs – DFS is hugely more memory efficient, if we know the distance to the goal vertex! But suppose we don’t know d. What is the (obvious) strategy?

Iterative Deepening DFS
IterativeDeepeningDFS(vertex s, g){ for (i=1;true;i++) if DFS(i, s, g) return; } // Also need to keep track of path found bool DFS(int limit, vertex s, g){ if (s==g) return true; if (limit-- <= 0) return false; for (n in children(s)) if (DFS(limit, n, g)) return true; return false;

Analysis of Iterative Deepening
Even without “marking” nodes as visited, iterative-deepening DFS never goes into an infinite loop For very large graphs, memory cost of keeping track of visited vertices may make marking prohibitive Work performed with limit < actual distance to G is wasted – but the wasted work is usually small compared to amount of work done during the last iteration

Asymptotic Analysis There are “pathological” graphs for which iterative deepening is bad: n=d G S

A Better Case Suppose each vertex n has k out-edges, no cycles
Bounded DFS to level i reaches ki vertices Iterative Deepening DFS(d) = ignore low order terms!

(More) Conclusions To find a shortest path between two nodes in a unweighted graph, use either BFS or Iterated DFS If the graph is large, Iterated DFS typically uses much less memory Later we’ll learn about heuristic search algorithms, which use additional knowledge about the problem domain to reduce the number of vertices visited

Single Source, Shortest Path for Weighted Graphs
Given a graph G = (V, E) with edge costs c(e), and a vertex s  V, find the shortest (lowest cost) path from s to every vertex in V Graph may be directed or undirected Graph may or may not contain cycles Weights may be all positive or not What is the problem if graph contains cycles whose total cost is negative? The problem that Dijkstra’s algorithm addresses is the single source, shortest path problem. Given a graph and a source vertex, find the shortest path from the source to every vertex. We can put a variety of limitations or spins on the problem. We’ll focus on weighted graphs with no negative weights. This is used in all sorts of optimization problems: minimum delay in a network, minimum cost flights for airplane routes, etc.

The Trouble with Negative Weighted Cycles
2 A B 10 -5 1 E You might wonder why we don’t allow negative weights. Here’s one reason, we’ll see another later. The shortest path here is undefined! We can always go once more around the cycle and get a lower cost path. 2 C D

Edsger Wybe Dijkstra (1930-2002)
Invented concepts of structured programming, synchronization, weakest precondition, and "semaphores" for controlling computer processes. The Oxford English Dictionary cites his use of the words "vector" and "stack" in a computing context. Believed programming should be taught without computers 1972 Turing Award “In their capacity as a tool, computers will be but a ripple on the surface of our culture. In their capacity as intellectual challenge, they are without precedent in the cultural history of mankind.” To move to weighted graphs, we appeal to the mighty power of Dijkstra. This is one of those names in computer science you should just know! Like Turing or Knuth. So, here’s the super-brief bio of Dijkstra. Look him up if you’re interested in more.

Dijkstra’s Algorithm for Single Source Shortest Path
Classic algorithm for solving shortest path in weighted graphs (with only positive edge weights) Similar to breadth-first search, but uses a priority queue instead of a FIFO queue: Always select (expand) the vertex that has a lowest-cost path to the start vertex a kind of “greedy” algorithm Correctly handles the case where the lowest-cost (shortest) path to a vertex is not the one with fewest edges Among the wide variety of things he has done, he created Dijkstra’s algorithm more than 30 years ago. Dijkstra’s algorithm is a greedy algorithm (like Huffman encoding), so it just makes the best local choice at each step. The choice it makes is which shortest path to declare known next. It starts by declaring the start node known to have a shortest path of length 0. Then, it updates neighboring node’s path costs according to the start node’s cost. Then, it just keeps picking the next shortest path and fixing that one until it has all the vertices.

Pseudocode for Dijkstra
Initialize the cost of each vertex to  cost[s] = 0; heap.insert(s); While (! heap.empty()) n = heap.deleteMin() For (each vertex a which is adjacent to n along edge e) if (cost[n] + edge_cost[e] < cost[a]) then cost [a] = cost[n] + edge_cost[e] previous_on_path_to[a] = n; if (a is in the heap) then heap.decreaseKey(a) else heap.insert(a)

Important Features Once a vertex is removed from the head, the cost of the shortest path to that node is known While a vertex is still in the heap, another shorter path to it might still be found The shortest path itself from s to any node a can be found by following the pointers stored in previous_on_path_to[a]

Dijkstra’s Algorithm in Action
B D F H G E 2 3 1 4 10 8 9 7 OK, let’s do this a bit more carefully.

Demo Dijkstra’s

Data Structures for Dijkstra’s Algorithm
|V| times: Select the unknown node with the lowest cost findMin/deleteMin O(log |V|) |E| times: a’s cost = min(a’s old cost, …) What data structures do we use to support these little snippets from Dijkstra’s? Priority Queue and a (VERY SIMPLE) dictionary. Initialization just takes O(|V|), so all the runtime is in these pq operations. O(E log V + V log V)… but, if we assume the graph is connected, this becomes: O(E log V) (Dijkstra’s will work fine with a disconnected graph, however.) decreaseKey O(log |V|) runtime: O(|E| log |V|)

CSE 326: Data Structures Lecture 8.B Heuristic Graph Search
Henry Kautz Winter Quarter 2002

Homework Hint - Problem 4
You can turn in a final version of your answer to problem 4 without penalty on Wednesday.

Outline Best First Search A* Search Example: Plan Synthesis
This material is NOT in Weiss, but is important for both the programming project and the final exam!

Huge Graphs Consider some really huge graphs…
All cities and towns in the World Atlas All stars in the Galaxy All ways 10 blocks can be stacked Huh???

Implicitly Generated Graphs
A huge graph may be implicitly specified by rules for generating it on-the-fly Blocks world: vertex = relative positions of all blocks edge = robot arm stacks one block stack(blue,table) stack(green,blue) stack(blue,red) stack(green,red) stack(green,blue)

Blocks World Source = initial state of the blocks
Goal = desired state of the blocks Path source to goal = sequence of actions (program) for robot arm! n blocks  nn vertices 10 blocks  10 billion vertices!

Problem: Branching Factor
Cannot search such huge graphs exhaustively. Suppose we know that goal is only d steps away. Dijkstra’s algorithm is basically breadth-first search (modified to handle arc weights) Breadth-first search (or for weighted graphs, Dijkstra’s algorithm) – If out-degree of each node is 10, potentially visits 10d vertices 10 step plan = 10 billion vertices visited!

An Easier Case Suppose you live in Manhattan; what do you do? S G
52nd St G 51st St 50th St 10th Ave 9th Ave 8th Ave 7th Ave 6th Ave 5th Ave 4th Ave 3rd Ave 2nd Ave

Best-First Search The Manhattan distance ( x+  y) is an estimate of the distance to the goal a heuristic value Best-First Search Order nodes in priority to minimize estimated distance to the goal h(n) Compare: BFS / Dijkstra Order nodes in priority to minimize distance from the start

Best First in Action Suppose you live in Manhattan; what do you do? S
52nd St G 51st St 50th St 10th Ave 9th Ave 8th Ave 7th Ave 6th Ave 5th Ave 4th Ave 3rd Ave 2nd Ave

Problem 1: Led Astray Eventually will expand vertex to get back on the right track S G 52nd St 51st St 50th St 10th Ave 9th Ave 8th Ave 7th Ave 6th Ave 5th Ave 4th Ave 3rd Ave 2nd Ave

Problem 2: Optimality With Best-First Search, are you guaranteed a shortest path is found when goal is first seen? when goal is removed from priority queue (as with Dijkstra?)

Sub-Optimal Solution No! Goal is by definition at distance 0: will be removed from priority queue immediately, even if a shorter path exists! (5 blocks) S 52nd St h=2 h=5 G h=4 51st St h=1 9th Ave 8th Ave 7th Ave 6th Ave 5th Ave 4th Ave

Synergy? Dijkstra / Breadth First guaranteed to find optimal solution
Best First often visits far fewer vertices, but may not provide optimal solution Can we get the best of both?

A* (“A star”) Order vertices in priority queue to minimize
(distance from start) + (estimated distance to goal) f(n) = g(n) h(n) f(n) = priority of a node g(n) = true distance from start h(n) = heuristic distance to goal

Optimality Suppose the estimated distance (h) is always less than or equal to the true distance to the goal heuristic is a lower bound on true distance Then: when the goal is removed from the priority queue, we are guaranteed to have found a shortest path!

Problem 2 Revisited S G vertex g(n) h(n) f(n) 52nd & 9th 5 52nd St
5 S (5 blocks) 52nd St G 51st St 50th St 9th Ave 8th Ave 7th Ave 6th Ave 5th Ave 4th Ave

Problem 2 Revisited S G vertex g(n) h(n) f(n) 52nd & 4th 5 2 7
51st & 9th 1 4 S (5 blocks) 52nd St G 51st St 50th St 9th Ave 8th Ave 7th Ave 6th Ave 5th Ave 4th Ave

51st & 8th 3 50th & 9th S (5 blocks) 52nd St G 51st St 50th St 9th Ave 8th Ave 7th Ave 6th Ave 5th Ave 4th Ave

51st & 7th 3 50th & 9th 50th & 8th 4 S (5 blocks) 52nd St G 51st St 50th St 9th Ave 8th Ave 7th Ave 6th Ave 5th Ave 4th Ave

51st & 6th 4 1 50th & 9th 50th & 8th 3 50th & 7th S (5 blocks) 52nd St G 51st St 50th St 9th Ave 8th Ave 7th Ave 6th Ave 5th Ave 4th Ave

51st & 5th 50th & 9th 50th & 8th 3 4 50th & 7th S (5 blocks) 52nd St G 51st St 50th St 9th Ave 8th Ave 7th Ave 6th Ave 5th Ave 4th Ave

Problem 2 Revisited DONE! S G vertex g(n) h(n) f(n) 52nd & 4th 5 2 7
50th & 9th 50th & 8th 3 4 50th & 7th S (5 blocks) 52nd St G 51st St 50th St 9th Ave 8th Ave 7th Ave 6th Ave 5th Ave 4th Ave DONE!

What Would Dijkstra Have Done?
(5 blocks) 52nd St G 51st St 50th St 49th St 48th St 47th St 9th Ave 8th Ave 7th Ave 6th Ave 5th Ave 4th Ave

Proof of A* Optimality A* terminates when G is popped from the heap.
Suppose G is popped but the path found isn’t optimal: priority(G) > optimal path length c Let P be an optimal path from S to G, and let N be the last vertex on that path that has been visited but not yet popped. There must be such an N, otherwise the optimal path would have been found. priority(N) = g(N) + h(N)  c So N should have popped before G can pop. Contradiction. non-optimal path to G S G undiscovered portion of shortest path portion of optimal path found so far N

What About Those Blocks?
“Distance to goal” is not always physical distance Blocks world: distance = number of stacks to perform heuristic lower bound = number of blocks out of place # out of place = 2, true distance to goal = 3

3-Blocks State Space Graph
ABC h=2 A BC h=1 A CB h=2 B AC h=2 B CA h=1 C AB h=3 C BA h=3 C A B h=3 B A C h=2 C B A h=3 A B C h=0 B C A h=3 A C B h=3 start goal

3-Blocks Best First Solution
ABC h=2 A BC h=1 A CB h=2 B AC h=2 B CA h=1 C AB h=3 C BA h=3 C A B h=3 B A C h=2 C B A h=3 A B C h=0 B C A h=3 A C B h=3 start goal

expanded, but not in solution
3-Blocks BFS Solution ABC h=2 expanded, but not in solution A BC h=1 A CB h=2 B AC h=2 B CA h=1 C AB h=3 C BA h=3 C A B h=3 B A C h=2 C B A h=3 A B C h=0 B C A h=3 A C B h=3 start goal

expanded, but not in solution
3-Blocks A* Solution ABC h=2 expanded, but not in solution A BC h=1 A CB h=2 B AC h=2 B CA h=1 C AB h=3 C BA h=3 C A B h=3 B A C h=2 C B A h=3 A B C h=0 B C A h=3 A C B h=3 start goal

Other Real-World Applications
Routing finding – computer networks, airline route planning VLSI layout – cell layout and channel routing Production planning – “just in time” optimization Protein sequence alignment Many other “NP-Hard” problems A class of problems for which no exact polynomial time algorithms exist – so heuristic search is the best we can hope for

Coming Up Other graph problems Connected components Spanning tree

CSE 326: Data Structures Part 8.C Spanning Trees and More
Henry Kautz Autumn Quarter 2002

Today Incremental hashing MazeRunner project Longest Path?
Finding Connected Components Application to machine vision Finding Minimum Spanning Trees Yet another use for union/find

Incremental Hashing

Maze Runner DFS, iterated DFS, BFS, best-first, A*
20 15 |* | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |X | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Maze Runner DFS, iterated DFS, BFS, best-first, A* Crufty old C++ code from fresh clean Java code Win fame and glory by writing a nice real-time maze visualizer

Java Note Java lacks enumerated constants…
enum {DOG, CAT, MOUSE} animal; animal a = DOG; Static constants not type-safe… static final int DOG = 1; static final int CAT = 2; static final int BLUE = 1; int favoriteColor = DOG;

Amazing Java Trick public final class Animal { private Animal() {}
public static final Animal DOG = new Animal(); public static final Animal CAT = new Animal(); } public final class Color { private Color() {} public static final Animal BLUE = new Color(); Animal x = DOG; Animal x = BLUE; // Gives compile-time error!

Longest Path Problem Given a graph G=(V,E) and vertices s, t
Find a longest simple path (no repeating vertices) from s to t. Does “reverse Dijkstra” work?

Dijkstra Initialize the cost of each vertex to  cost[s] = 0;
heap.insert(s); While (! heap.empty()) n = heap.deleteMin() For (each vertex a which is adjacent to n along edge e) if (cost[n] + edge_cost[e] < cost[a]) then cost [a] = cost[n] + edge_cost[e] previous_on_path_to[a] = n; if (a is in the heap) then heap.decreaseKey(a) else heap.insert(a)

Reverse Dijkstra Initialize the cost of each vertex to  cost[s] = 0;
heap.insert(s); While (! heap.empty()) n = heap.deleteMax() For (each vertex a which is adjacent to n along edge e) if (cost[n] + edge_cost[e] > cost[a]) then cost [a] = cost[n] + edge_cost[e] previous_on_path_to[a] = n; if (a is in the heap) then heap.increaseKey(a) else heap.insert(a)

Does it Work? a 6 t 3 5 b 1 s

Problem No clear stopping condition!
How many times could a vertex be inserted in the priority queue? Exponential! Not a “good” algorithm! Is the better one?

Counting Connected Components
Initialize the cost of each vertex to  Num_cc = 0 While there are vertices of cost  { Pick an arbitrary such vertex S, set its cost to 0 Find paths from S Num_cc ++ }

Using DFS Set each vertex to “unvisited” Num_cc = 0
While there are unvisited vertices { Pick an arbitrary such vertex S Perform DFS from S, marking vertices as visited Num_cc ++ } Complexity = O(|V|+|E|)

Using Union / Find Put each node in its own equivalence class
Num_cc = 0 For each edge E = <x,y> Union(x,y) Return number of equivalence classes Complexity =

Using Union / Find Put each node in its own equivalence class
Num_cc = 0 For each edge E = <x,y> Union(x,y) Return number of equivalence classes Complexity = O(|V|+|E| ack(|E|,|V|))

Machine Vision: Blob Finding

Machine Vision: Blob Finding
1 2 3 5 4

Blob Finding Matrix can be considered an efficient representation of a graph with a very regular structure Cell = vertex Adjacent cells of same color = edge between vertices Blob finding = finding connected components

Tradeoffs Both DFS and Union/Find approaches are (essentially) O(|E|+|V|) = O(|E|) for binary images For each component, DFS (“recursive labeling”) can move all over the image – entire image must be in main memory Better in practice: row-by-row processing localizes accesses to memory typically 1-2 orders of magnitude faster!

High-Level Blob-Labeling
Scan through image left/right and top/bottom If a cell is same color as (connected to) cell to right or below, then union them Give the same blob number to cells in each equivalence class

Blob-Labeling Algorithm
Put each cell <x,y> in it’s own equivalence class For each cell <x,y> if color[x,y] == color[x+1,y] then Union( <x,y>, <x+1,y> ) if color[x,y] == color[x,y+1] then Union( <x,y>, <x,y+1> ) label = 0 For each root <x,y> blobnum[x,y] = ++ label; For each cell <x,y> blobnum[x,y] = blobnum( Find(<x,y>) )

Spanning Tree Spanning tree: a subset of the edges from a connected graph that… touches all vertices in the graph (spans the graph) forms a tree (is connected and contains no cycles) Minimum spanning tree: the spanning tree with the least total edge cost. 4 7 Interesting note before we move on: it turns out that Dijkstra’s is as good as the best algorithm we have for single-source, single-destination shortest path! OK, that’s Dijkstra’s. Now, let’s get a quick definition before we move on to Kruskal’s algorithm. Which of these three is the minimum spanning tree? (The one in the middle) 9 2 1 5

Applications of Minimal Spanning Trees
Communication networks VLSI design Transportation systems

Kruskal’s Algorithm for Minimum Spanning Trees
A greedy algorithm: Initialize all vertices to unconnected While there are still unmarked edges Pick a lowest cost edge e = (u, v) and mark it If u and v are not already connected, add e to the minimum spanning tree and connect u and v Here’s how Kruskal’s works. This should look very familiar. Remember our algorithm for maze creation? Except that the edge order there was random, this is the same as that algorithm! Sound familiar? (Think maze generation.)

Kruskal’s Algorithm in Action (1/5)
2 2 3 B A F H 1 2 1 4 9 10 G C 4 2 8 Therefore, this example should look a lot like maze creation. We test the cheapest edge first. Since B and C are unconnected, we add the edge to the MST and union B and C. D E 7

3 B A F H 1 2 1 4 9 10 G C 4 2 8 I’ll do some uninteresting edges right away (all those edges until the 3 succeed). Now, we test 3. Are F and H connected? Yes, so we throw out the 3 edge. D E 7

2 2 3 B A F H 1 2 1 4 9 10 G C 4 2 8 What about A and D? Connected: so throw out the 4 edge. D E 7

2 2 3 B A F H 1 2 1 4 9 10 G C 4 2 8 Finally, G and E are unconnected, so we union them. But, that connects the last unconnected vertices in the graph. Therefore, we’re done. D E 7

Kruskal’s Algorithm Completed (5/5)
2 2 3 B A F H 1 2 1 4 9 10 G C 4 2 8 Here’s the minimum spanning tree. Notice that just like with mazes: the tree has no cycles it touches all the vertices there’s one unique path between any two vertices. D E 7

Why Greediness Works Proof by contradiction that Kruskal’s finds a minimum spanning tree: Assume another spanning tree has lower cost than Kruskal’s. Pick an edge e1 = (u, v) in that tree that’s not in Kruskal’s. Consider the point in Kruskal’s algorithm where u’s set and v’s set were about to be connected. Kruskal selected some edge to connect them: call it e2 . But, e2 must have at most the same cost as e1 (otherwise Kruskal would have selected it instead). So, swap e2 for e1 (at worst keeping the cost the same) Repeat until the tree is identical to Kruskal’s, where the cost is the same or lower than the original cost: contradiction! We already know this makes a spanning tree because our maze generation algorithm made a spanning tree. But, does this find the minimum spanning tree? Let’s assume it doesn’t. Then, there’s some other better spanning tree. Let’s try and make that tree more like Kruskal’s tree. (otherwise, Kruskal’s would have considered and chosen e1 before ever reaching e2)

Data Structures for Kruskal’s Algorithm
|E| times: Once: Initialize heap of edges… Pick the lowest cost edge… buildHeap findMin/deleteMin |E| times: If u and v are not already connected… …connect u and v. What data structures do we need for this? Priority Queue and Disjoint Set Union/Find What about initialization? We need to put all the edges in a priority queue; we can do that fast with a buildHeap call: O(|E|) So, overall, the runtime is O(|E|ack(|E|,|V|) + |E|log|E|) Who knows what’s slower? Inverse ackermann’s or log? LOG! How much slower? TONS! One last little point: |E| is at most |V|2, but log|V|2 = 2 log |V|. So: O(|E|log|V|) union runtime: |E| + |E| log |E| + |E| ack(|E|,|V|)

Data Structures for Kruskal’s Algorithm
|E| times: Once: Initialize heap of edges… Pick the lowest cost edge… buildHeap findMin/deleteMin |E| times: If u and v are not already connected… …connect u and v. What data structures do we need for this? Priority Queue and Disjoint Set Union/Find What about initialization? We need to put all the edges in a priority queue; we can do that fast with a buildHeap call: O(|E|) So, overall, the runtime is O(|E|ack(|E|,|V|) + |E|log|E|) Who knows what’s slower? Inverse ackermann’s or log? LOG! How much slower? TONS! One last little point: |E| is at most |V|2, but log|V|2 = 2 log |V|. So: O(|E|log|V|) union runtime: |E| + |E| log |E| + |E| ack(|E|,|V|) = O(|E|log|E|)

Prim’s Algorithm Can also find Minimum Spanning Trees using a variation of Dijkstra’s algorithm: Pick a initial node Until graph is connected: Choose edge (u,v) which is of minimum cost among edges where u is in tree but v is not Add (u,v) to the tree Same “greedy” proof, same asymptotic complexity

Coming Up Application: Sentence Disambiguation
All-pairs Shortest Paths NP-Complete Problems Advanced topics Quad trees Randomized algorithms

Sentence Disambiguation
A person types a message on their cell phone keypad. Each button can stand for three different letter (e.g. “1” is a, b, or c), but the person does not explicitly indicate which letter is meant. (Words are separated by blanks – the “0” key.) Problem: How can the system determine what sentence was typed? My Nokia cell phone does this! How can this problem be cast as a shortest-path problem?

Sentence Disambiguation as Shortest Path
Idea: Possible words are vertices Directed edge between adjacent possible words Weight on edge from W1 to W2 is probability that W2 appears adjacent to W1 Probabilities over what?! Some large archive (corpus) of text “Word bi-gram” model Find the most probable path through the graph

W11 W12 W13 W21 W23 W22 W31 W11 W33 W41 W43

Technical Concerns Isn’t “most probable” actually longest (most heavily weighted) path?! Shouldn’t we be multiplying probabilities, not adding them?!

Logs to the Rescue Make weight on edge fromW1 to W2 be
- log P(W2 | W1) Logs of probabilities are always negative numbers, so take negative logs The lower the probability, the larger the negative log! So this is shortest path Adding logs is the same as multiplying the underlying quantities

To Think About This really works in practice – 99% accuracy!
Cell phone memory is limited – how can we use as little storage as possible? How can the system customize itself to a user?

Question (|V||E|log|V|) (|V|3)
Which graph algorithm is asymptotically better: (|V||E|log|V|) (|V|3)

All Pairs Shortest Path
Suppose you want to compute the length of the shortest paths between all pairs of vertices in a graph… Run Dijkstra’s algorithm (with priority queue) repeatedly, starting with each node in the graph: Complexity in terms of V when graph is dense:

Dynamic Programming Approach

Floyd-Warshall Algorithm
// C – adjacency matrix representation of graph // C[i][j] = weighted edge i->j or  if none // D – computed distances FW(int n, int C [][], int D [][]){ for (i = 0; i < N; i++){ for (j = 0; j < N; j++) D[i][j] = C[i][j]; D[i][i] = 0.0; } for (k = 0; k < N; k++) for (i = 0; i < N; i++) if (D[i][k] + D[k][j] < D[i][j]) D[i][j] = D[i][k] + D[k][j]; Run time = How could we compute the paths?

CSE 326: Data Structures Part 8 Graphs

Similar presentations

Presentation on theme: "CSE 326: Data Structures Part 8 Graphs"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CSE 326: Data Structures Part 8 Graphs

Similar presentations

Presentation on theme: "CSE 326: Data Structures Part 8 Graphs"— Presentation transcript:

Similar presentations

About project

Feedback