Presentation is loading. Please wait.

Presentation is loading. Please wait.

CMPSCI 187 1 Computer Science 187 Introduction to Introduction to Programming with Data Structures Introduction to Introduction to Programming with Data.

Similar presentations


Presentation on theme: "CMPSCI 187 1 Computer Science 187 Introduction to Introduction to Programming with Data Structures Introduction to Introduction to Programming with Data."— Presentation transcript:

1 CMPSCI 187 1 Computer Science 187 Introduction to Introduction to Programming with Data Structures Introduction to Introduction to Programming with Data Structures Lecture 25 Graphs - Part 2 Lecture 25 Graphs - Part 2 Announcements

2 CMPSCI 187 2 Directed Graphs: digraphs l Reachability…where can we get to? l Connectivity…who is connected to whom? l Transitive Closure l Floyd-Warshall Algorithm….to compute transitive closure

3 CMPSCI 187 3 Digraphs l Each edge goes in one direction l Edge (a,b) goes from a to b, but not b to a l The indegree of a vertex in a digraph is the number of inpointing edges incident on the vertex l The outdegree of a vertex in a digraph is the number of outpointing edges Tamassia's Office Brown University Campus A: indegree = 2 outdegree = 1 C: indegree = 2 outdegree = 2 c

4 CMPSCI 187 4 Another Application: Scheduling l Scheduling: edge (a,b) means task a must be completed before b can be started

5 CMPSCI 187 5 Directed Acyclic Graphs: dags l directed graph with no directed cycles DAGnot a DAG Trees  DAGs  Graphs

6 CMPSCI 187 6 Simple Paths and Cycles l A simple path repeats no vertices (except that the first can be the last): F p = {Seattle, Salt Lake City, San Francisco, Dallas} F p = {Seattle, Salt Lake City, Dallas, San Francisco, Seattle} l A cycle is a path that starts and ends at the same vertex: F p = {Seattle, Salt Lake City, Dallas, San Francisco, Seattle} F p = {Chicago, Dallas, Salt Lake, Chicago, Dallas, Chicago l A simple cycle is a cycle that repeats no vertices except that the first vertex is also the last (in undirected graphs, no edge can be repeated) F p = {Seattle, Salt Lake City, Dallas, San Francisco, Seattle} Seattle San Francisco Chicago Salt Lake City Dallas

7 CMPSCI 187 7 Depth First Search l Same algorithm as for undirected graphs l On a connected digraph, may yield unconnected DFS trees (i.e., a DFS forest)

8 CMPSCI 187 8 Reachability l DFS tree rooted at v: vertices reachable from v via directed paths l Interesting problems dealing with reachability in a digraph G: F Given vertices u and v, determine whether u reaches v. F Find all vertices of G that are reachable from a given vertex s. F Determine whether G is strong connected. F Determine whether G is acyclic F Compute the transitive closure G* of G.

9 CMPSCI 187 9 Reachability Examples A directed path from BOS to LAX is shown in red. Bos JFK MIA DFW ORD SFO LAX Bos JFK MIA DFW ORD SFO LAX A directed cycle (ORD, MIA, DFW, LAX, ORD) is shown in red; its vertices induce a strongly connected subgraph Bos JFK MIA DFW ORD SFO LAX The subgraph of vertices and edges reachable from ORD is shown in red. Bos JFK MIA DFW ORD SFO LAX Removing the dashed red lines results in a directed acyclic graph.

10 CMPSCI 187 10 Strongly Connected Digraph l Each vertex can reach all other vertices

11 CMPSCI 187 11 Strongly Connected Components { a, c, g } { f, d, e, b }

12 CMPSCI 187 12 Transitive Closure l Digraph G * is obtained from G using the rule: F If there is a directed path in G from a to b, then add the edge (a,b) to G * G* is called the transitive closure of G. GG* Added

13 CMPSCI 187 13 Definitions Undirected graphs are connected if there is a path between any two vertices Directed graphs are strongly connected if there is a path from any one vertex to any other Di-graphs are weakly connected if there is a path between any two vertices, ignoring direction A complete graph has an edge between every pair of vertices

14 CMPSCI 187 14 DFS and BFS for Digraphs l Algorithms are very similar to their undirected counterparts. F Algorithms only traverse edges according to their respective directions. l Searches can be used to answer reachability questions F DFS on G starting at vertex s visits all the vertices of G that are reachable from s. F The DFS tree contains directed paths from s to every vertex reachable from s. l If G is a digraph with n vertices and m edges, then DFS runs in O(n+m). l The following problems can be solved by an algorithm that traverse G n times using DFS; complexities are O(n(n+m)): F Computing, for each vertex of G, the subgraph reachable from v F Testing whether G is strongly connected F Computing the transitive closure G* of G

15 CMPSCI 187 15 Topological Sort Given a graph, G = (V, E), output all the vertices in V such that no vertex is output before any other vertex with an edge to it. check in airport call taxi taxi to airport reserve flight pack bags take flight locate gate

16 CMPSCI 187 16 Topological Sort: General Idea Label each vertex’s in-degree (# of inbound edges) While there are vertices remaining Pick a vertex with in-degree of zero and output it Reduce the in-degree of all vertices adjacent to it Remove it from the list of vertices

17 CMPSCI 187 17 Topological Sort Refinement Label each vertex’s in-degree Initialize a queue to contain all in-degree zero vertices While there are vertices remaining in the queue Pick a vertex v with in-degree of zero and output it Reduce the in-degree of all vertices adjacent to v Put any of these with new in-degree zero on the queue Remove v from the queue

18 CMPSCI 187 18 Weighted Graphs l Weighted Graphs F weights on the edges of a graph represent distances, costs, etc. F An example of an undirected weighted graph: l Shortest Paths

19 CMPSCI 187 19 Single Source Shortest Path Given a graph G = (V, E) and a vertex s  V, find the shortest path from s to every vertex in V Many variations: F directed vs. undirected F weighted vs. unweighted F cyclic vs. acyclic F positive weights only vs. negative weights allowed F multiple weight types to optimize

20 CMPSCI 187 20 The Trouble with Negative Weighted Cycles AB CD E 2 10 1 -5 2 What’s the shortest path from A to E? (or to B, C, or D, for that matter)

21 CMPSCI 187 21 Shortest Path l BFS finds paths with the minimum number of edges from the start vertex l Hencs, BFS finds shortest paths assuming that each edge has the same weight l In many applications, e.g., transportation networks, the edges of a graph have different weights. l How can we find paths of minimum total weight? l Example - Boston to Los Angeles:

22 CMPSCI 187 22 Dijkstra's Algorithm l Dijkstra’s algorithm finds shortest paths from a start vertex v to all the other vertices in a graph with F undirected edges (works for directed edges w/minor mods) F nonnegative edge weights l The algorithm computes for each vertex u the distance of u from the start vertex v, that is, the weight of a shortest path between v and u. l The algorithm keeps track of the set of vertices for which the distance has been computed, called the cloud C

23 CMPSCI 187 23 Dijkstra's Algorithm, cont. l Every vertex has a label D associated with it. For any vertex u, we can refer to its D label as D[u]. l D[u] stores an approximation of the distance between v and u. The algorithm will update a D[u] value when it finds a shorter path from v to u. l When a vertex u is added to the cloud, its label D[u] is equal to the actual (final) distance between the starting vertex v and vertex u. l Initially, we set F - D[v] = 0...the distance from v to itself is 0...  - D[u] =  for u  v...these will change...

24 CMPSCI 187 24 Expanding the Cloud l Repeat until all vertices have been put in the cloud: F let u be a vertex not in the cloud that has smallest label D[u]. (On the first iteration, naturally the starting vertex will be chosen.) F we add u to the cloud C F we update the labels of the adjacent vertices of u as follows for each vertex z adjacent to u do if z is not in the cloud C then if D[u] + weight(u,z) < D[z] then D[z] = D[u] + weight(u,z) l The above step is called a relaxation of edge (u,z) v was put in the cloud first. Then this u. Then this u. 85 90

25 CMPSCI 187 25 PseudoCode l We use a priority queue Q to store the vertices not in the cloud, where D[v] is the key of a vertex v in Q Algorithm ShortestPath(G, v): Input: A weighted graph G and a distinguished vertex v of G. Output: A label D[u], for each vertex u of G, such that D[u] is the length of a shortest path from v to u in G. initialize D[v]  0 and D[u]  +  for each vertex v  u let Q be a priority queue that contains all of the vertices of G using the D labels as keys. while Q  do {pull u into the cloud C} u  Q.removeMinElement() for each vertex z adjacent to u such that z is in Q do {perform the relaxation operation on edge (u, z) } if D[u] + w((u, z)) < D[z] then D[z]  D[u] + w((u, z)) change the key value of z in Q to D[z] return the label D[u] of each vertex u.

26 CMPSCI 187 26 Example BOS  BWI0 DFW  JFKBWI184 LAX  MIABWI946 ORDBWI621 PVD  SFO  Parent Distance for each vertex z adjacent to u do if z is not in the cloud C then if D[u] + weight(u,z) < D[z] then D[z] = D[u] + weight(u,z) Pull JFK into the cloud and continue

27 CMPSCI 187 27 JFK is the nearest BOSJFK371 BWI0 DFWJFK1575 JFKBWI184 LAX  MIABWI946 ORDBWI621 PVDJFK 328 SFO Parent Distance for each vertex z adjacent to u do if z is not in the cloud C then if D[u] + weight(u,z) < D[z] then D[z] = D[u] + weight(u,z) u Was  Was 621 Was  Was 946 Pull PVD into the cloud and continue

28 CMPSCI 187 28 Followed by PVD BOSJFK371 BWI0 DFWJFK 1575 JFKBWI184 LAX  MIABWI946 ORDBWI621 PVDJFK328 SFO Parent Distance u for each vertex z adjacent to u do if z is not in the cloud C then if D[u] + weight(u,z) < D[z] then D[z] = D[u] + weight(u,z) Pull BOS into the cloud and continue

29 CMPSCI 187 29 Boston is just a bit further BOSJFK371 BWI0 DFWJFK1575 JFKBWI184 LAX  MIABWI946 ORDBWI621 PVDJFK328 SFOBOS3075 Parent Distance for each vertex z adjacent to u do if z is not in the cloud C then if D[u] + weight(u,z) < D[z] then D[z] = D[u] + weight(u,z) Was  u Pull ORD into the cloud and continue

30 CMPSCI 187 30 Chicago is next BOSJFK371 BWI0 DFWORD1423 JFKBWI184 LAX  MIABWI946 ORDBWI621 PVDJFK328 SFOORD2467 Parent Distance Both were adjusted this turn u Was 1575 Was 3075 Pull MIA into the cloud and continue

31 CMPSCI 187 31 Now Miami BOSJFK371 BWI0 DFWORD1423 JFKBWI184 LAXMIA3288 MIABWI946 ORDBWI621 PVDJFK328 SFOORD2467 Parent Distance u for each vertex z adjacent to u do if z is not in the cloud C then if D[u] + weight(u,z) < D[z] then D[z] = D[u] + weight(u,z) Pull DFW into the cloud and continue

32 CMPSCI 187 32 Now Dallas-Fort Worth BOSJFK371 BWI0 DFWORD1423 JFKBWI184 LAXDFW2658 MIABWI946 ORDBWI621 PVDJFK328 SFOORD2467 Parent Distance LAX was adjusted this turn Pull SFO into the cloud and continue u

33 CMPSCI 187 33 Now San Francisco BOSJFK371 BWI0 DFWORD1423 JFKBWI184 LAXDFW2658 MIABWI946 ORDBWI621 PVDJFK328 SFOORD2467 Parent Distance

34 CMPSCI 187 34 Finally, LA BOSJFK371 BWI0 DFWORD1423 JFKBWI184 LAXDFW2658 MIABWI946 ORDBWI621 PVDJFK328 SFOORD2467 Parent Distance

35 CMPSCI 187 35 Running Time l Let’s assume that we represent G with an adjacency list. We can then step through all the vertices adjacent to u in time proportional to their number (i.e. O(deg u) where deg u in the number of vertices adjacent to u) l The priority queue Q - we have a choice: F A Heap: Implementing Q with a heap allows for efficient extraction of vertices with the smallest D label(O(logN)). Key updates can also be performed in O(logN) time. The total run time is O((n+m)logn) where n is the number of vertices in G and m in the number of edges. In terms of n, worst case time is O(n 2 logn) F An Unsorted Sequence: O(n) when we extract minimum elements, but fast key updates (O(1)). There are only n-1 extractions and m relaxations. The running time is O(n 2 +m) l In terms of worst case time, heap is good for small data sets and sequence for larger.

36 CMPSCI 187 36 Running Time, cont. l The average case is a slightly different story. Consider this: l If priority queue Q is implemented with a heap, the bottleneck step is updating the key of a vertex in Q. In the worst case, we would need to perform an update for every edge in the graph. l For most graphs, though, this would not happen. Using the random neighbor-order assumption, we can observe that for each vertex, its neighbor vertices will be pulled into the cloud in essentially random order. So here are only O(logn) updates to the key of a vertex. l Under this assumption, the run time of the heap implementation is O(nlogn+m), which is always O(n 2 ). The heap implementation is thus preferable for all but degenerate cases.

37 CMPSCI 187 37 Observations l In our example, the weight is the geographical distance. However, the weight could just as easily represent the cost or time to fly the given route. l We can easily modify Dijkstra’s algorithm for different needs, for instance: F If we just want to know the shortest path from vertex v to a single vertex u, we can stop the algorithm as soon as u is pulled into the cloud. F Or, we could have the algorithm output a tree T rooted at v such that the path in T from v to a vertex u is a shortest path from v to u.

38 CMPSCI 187 38 Weighted Graphs weight(G') = 800 + 400 + 1200 = 2400 (weight of subgraph G') = (sum of weights of edges of G') weight(G') =  weight(e) (e  G')

39 CMPSCI 187 39 Minimum Spanning Tree l spanning tree of minimum total weight l e.g., connect all the computers in a building with the least amount of cable l Examples: l Minimum spanning trees are not generally unique

40 CMPSCI 187 40 Applications of MSTs l Communication networks l VLSI design l Transportation systems l Good approximation to some NP-hard problems (take CS250)

41 CMPSCI 187 41 Minimum Spanning Tree Property l Let (V',V") be a partition of the vertices of G l Let e = (v', v"), be an edge of minimum weight across the partition, i.e., v'  V' and v"  V". l There is a MST containing edge e.

42 CMPSCI 187 42 Proof of Property l If the MST does not contain a minimum weight edge e, then we can find a better or equal MST by exchanging e for some other edge.

43 CMPSCI 187 43 Prim-Jarnik Algorithm for the MST l grows the MST T one vertex at a time l cloud covering the portion of T already computed l labels D[u] and E[u] associated with each vertex u F E[u] is the best (lowest weight) edge connecting u to T F D[u] (distance to the cloud) is the weight of E[u]

44 CMPSCI 187 44 Differences between Prim’s and Dijkstra’s l For any vertex u, D[u] represents the weight of the current best edge for joining u to the rest of the tree (as opposed to the total sum of edge weights on a path from start vertex to u). l Use a priority queue Q whose keys are D labels, and whose elements are vertex-edge pairs. l Any vertex v can be the starting vertex. l We still initialize all the D[u] values to INFINITE, but we also initialize E[u] (the edge associated with u) to null. l Return the minimum-spanning tree T. We can reuse code from Dijkstra’s algorithm, and we only have to change a few things. Let’s look at the pseudocode....

45 CMPSCI 187 45 Prim-Jarnik Pseudo Code Algorithm PrimJarnik(G): Input: A weighted graph G. Output: A minimum spanning tree T for G. pick any vertex v of G {grow the tree starting with vertex v} T  {v} D[u]  0 E[u]  for each vertex u  v do D[u]  let Q be a priority queue that contains vertices, using the D labels as keys while Q  do {pull u into the cloud C} u  Q.removeMinElement() add vertex u and edge E[u] to T for each vertex z adjacent to u do if z is in Q {perform the relaxation operation on edge (u, z) } if weight(u, z) < D[z] then D[z]  weight(u, z) E[z]  (u, z) change the key of z in Q to D[z] return tree T

46 CMPSCI 187 46 Example DFWSTL400 LAXSTL1800 LGASTL1200 MIA MSNSTL800 PVD SEA SFO STL Neighbor D[u] Start at v = STL

47 CMPSCI 187 47 Closest is DFW DFWSTL400 LAXDFW1500 LGASTL1200 MIADFW1000 MSNSTL800 PVD SEA SFO STL Neighbor D[u] D[u] updated when DFW added to cloud

48 CMPSCI 187 48 Now Minneapolis-St. Paul DFWSTL400 LAXDFW1500 LGAMSN1000 MIADFW1000 MSNSTL800 PVD SEAMSN1500 SFO STL Neighbor D[u]

49 CMPSCI 187 49 Now LaGuardia DFWSTL400 LAXDFW1500 LGAMSN1000 MIADFW1000 MSNSTL800 PVDLGA200 SEAMSN1500 SFO STL Neighbor D[u]

50 CMPSCI 187 50 Now Providence DFWSTL400 LAXDFW1500 LGAMSN1000 MIADFW1000 MSNSTL800 PVDLGA200 SEAMSN1500 SFO STL Neighbor D[u] PVD 200

51 CMPSCI 187 51 Now Miami PVD 200 DFWSTL400 LAXDFW1500 LGAMSN1000 MIADFW1000 MSNSTL800 PVDLGA200 SEAMSN1500 SFO STL Neighbor D[u] MIA 1500 1000

52 CMPSCI 187 52 Now Seattle PVD 200 MIA 1500 DFWSTL400 LAXDFW1500 LGAMSN1000 MIADFW1000 MSNSTL800 PVDLGA200 SEAMSN1500 SFOSEA800 STL Neighbor D[u] 800 1500 SEA SFO 1000

53 CMPSCI 187 53 Now SanFrancisco PVD 200 MIA 1500 800 1500 SEA SFO DFWSTL400 LAXSFO400 LGAMSN1000 MIADFW1000 MSNSTL800 PVDLGA200 SEAMSN1500 SFOSEA800 STL Neighbor D[u] 400 1500 1000

54 CMPSCI 187 54 Finally, Los Angeles PVD 200 MIA 1500 800 1500 SEA SFO 400 1500 DFWSTL400 LAXSFO400 LGAMSN1000 MIADFW1000 MSNSTL800 PVDLGA200 SEAMSN1500 SFOSEA800 STL Neighbor D[u] LAX 1000

55 CMPSCI 187 55 Final Minimal Spanning Tree STL DFWMSN SEALGA MIA PVDSFO LAX 800400 800 400 15001000 200 1000

56 CMPSCI 187 56 Running Time l Complexity O((n+m) log n) F where n = num vertices, m=num edges, F and Q is implemented with a heap.

57 CMPSCI 187 57 Searching HUGE Graphs l Consider some really huge graphs… F All cities and towns in the World Atlas F All stars in the Galaxy F All ways 10 blocks can be stacked Huh???

58 CMPSCI 187 58 Implicitly Generated Graphs l A huge graph may be implicitly specified by rules for generating it on-the-fly l Blocks world: F vertex = relative positions of all blocks F edge = robot arm could stack one block stack(blue,red) stack(green,red) stack(green,blue)

59 CMPSCI 187 59 Robotics Blocks World l Source = initial state of the blocks l Goal = desired state of the blocks l Path from source to goal = sequence of actions (program) for robot arm! l n blocks  n n states l 10 blocks  10 billion states stack(blue,red) stack(green,blue) Uh-Oh!

60 CMPSCI 187 60 Problem: Branching Factor or Out-degree of each vertex l Cannot search such huge graphs exhaustively. Suppose we know that goal is only d steps away. l Dijkstra’s algorithm is basically breadth-first search (taking into account the edge weights) l If the out-degree of each node is 10, potentially visits 10 d vertices F 10 step plan = 10 billion vertices!

61 CMPSCI 187 61 A Simpler Example l Suppose you live in Manhattan; what do you do? 52 nd St 51 st St 50 th St 10 th Ave 9 th Ave 8 th Ave 7 th Ave6 th Ave5 th Ave4 th Ave 3 rd Ave 2 nd Ave S G

62 CMPSCI 187 62 Best-First Search l The manhattan distance (  x+  y) is an estimate of the distance to the goal F a heuristic value l Best-First Search F Order nodes in priority to minimize estimated distance to the goal l Compare: Dijkstra F Order nodes in priority to minimize distance from the start 52 nd St 51 st St 50 th St 10 th Ave 9 th Ave 8 th Ave 7 th Ave6 th Ave5 th Ave4 th Ave 3 rd Ave 2 nd Ave S G x y  x= 6  y = 1 Distance ~ 7 Best-First Action

63 CMPSCI 187 63 Problem 1: Led Astray l Led astray – eventually will expand vertex to get back on the right track 52 nd St 51 st St 50 th St 10 th Ave 9 th Ave 8 th Ave 7 th Ave6 th Ave5 th Ave4 th Ave 3 rd Ave 2 nd Ave S G

64 CMPSCI 187 64 Problem 2: Optimality l With Best-First Search, are you guaranteed a shortest path is found when F goal is first seen? F when goal is removed from priority queue? l No! Goal is by definition at distance 0: will be removed from priority queue immediately when it is seen, even if a shorter path exists! 52 nd St 51 st St 9 th Ave 8 th Ave 7 th Ave6 th Ave5 th Ave4 th Ave S G (5 blocks) Best-First Search typically results in a sub-optimal solution!

65 CMPSCI 187 65 Dijkstra vs. Best-First l Dijkstra / Breadth First guaranteed to find optimal solution l Best First often visits far fewer vertices, but may not provide optimal solution F Can we get the best of both?

66 CMPSCI 187 66 The A* Algorithm l Order vertices in priority queue to minimize (distance from start) + (estimated distance to goal) f(n) = g(n) + h(n) l Where: f(n) = priority of a node g(n) = true distance from start h(n) = heuristic distance to goal l Suppose the estimated distance (h) is  the true distance to the goal F (heuristic is a lower bound) l Then: when the goal is removed from the priority queue, we are guaranteed to have found a shortest path!

67 CMPSCI 187 67 Problem 2 Revisited 52 nd St 51 st St 9 th Ave 8 th Ave 7 th Ave6 th Ave5 th Ave4 th Ave S G (5 blocks) Priority = 1+4=5 Dijkstra would have visited these guys! 50 th St Priority = 5+2=6

68 CMPSCI 187 68 A Little History l A* invented by Nils Nilsson & colleagues in 1968 F or maybe some guy in Operations Research? l Cornerstone of artificial intelligence F still a hot (OK - lukewarm) research topic! F iterative deepening A*, automatically generating heuristic functions, … l Method of choice for search large (even infinite) graphs when a good heuristic function can be found l Proofs of optimality exist

69 CMPSCI 187 69 Remember the Blocks? l “Distance to goal” is not always physical distance l Blocks world: F distance = number of stacks to perform F heuristic lower bound = number of blocks out of place # out of place = 2, true distance to goal = 3 123

70 CMPSCI 187 70 Other Examples l Simplifying Integrals F vertex = formula F goal = closed form formula without integrals F arcs = mathematical transformations F heuristic = number of integrals remaining in formula l Problem: given chopped up DNA, reassemble F Vertex = set of pieces F Arc = stick two pieces together F Goal = only one piece left F Heuristic = number of pieces remaining - 1 l Lots More!

71 CMPSCI 187 71 Machine Vision: Blob Finding l Find and label the connected components in an image. 1 2 3 4 5

72 CMPSCI 187 72 Blob Finding l Matrix can be considered an efficient representation of a graph with a very regular structure l Cell = vertex l Adjacent cells of same color = edge between vertices l Blob finding = finding connected components

73 CMPSCI 187 73 Tradeoffs l DFS approache is (essentially) O(E+V) = O(V) for binary images F Why? l For each component, DFS (“recursive labeling”) can move all over the image – entire image must be in main memory l Better in practice: row-by-row processing F localizes accesses to memory l Algorithm: F Scan through image left/right and top/bottom F If a cell is same color as (connected to) cell to right or below, then union them into an equivalence class {SET THEORETIC APPROACH} F Give the same blob number to cells in each equivalence class

74 CMPSCI 187 74 Blob Labeling Algorithm Put each cell in its own equivalence class For each cell if color[x,y] == color[x+1,y] then Union(, ) if color[x,y] == color[x,y+1] then Union(, ) label = 0 For each root blobnum[x,y] = ++ label; For each cell blobnum[x,y] = blobnum( Find( ) )

75 CMPSCI 187 75


Download ppt "CMPSCI 187 1 Computer Science 187 Introduction to Introduction to Programming with Data Structures Introduction to Introduction to Programming with Data."

Similar presentations


Ads by Google