# Cs420dl lecture 7 Graph traversals wim bohm cs csu.

## Presentation on theme: "Cs420dl lecture 7 Graph traversals wim bohm cs csu."— Presentation transcript:

cs420dl lecture 7 Graph traversals wim bohm cs csu

Graph definitions Graph G = (V, E) V: set of nodes or vertices, E: set of edges (pairs of nodes). In an undirected graph, edges are unordered pairs of nodes. In a directed graph edges are ordered pairs of nodes. Path: sequence of nodes (v 0..v n ) s.t.  i: (v i,v i+1 ) is an edge. Path length: number of edges in the path. Simple path: all nodes distinct. Cycle: path with first and last node equal. Acyclic graph: graph without cycles. DAG: directed acyclic graph.

more defs Two nodes are adjacent if there is an edge between them. In a complete graph all nodes in the graph are adjacent. An undirected graph is connected if for all nodes v i and v j there is a path from v i to v j. G’(V’, E’) is a sub-graph of G(V,E) if V’  V and E’  E The sub-graph of G induced by V’ has all the edges (u,v)  E such that u  V’ and v  V’. A directed graph can be partitioned in strongly connected components: maximal sub-graphs C where for every u and v in C there is a path from u to v and there is a path from v to u. In a weighted graph the edges have a weight (cost, length,..) associated with them.

Graph Representations Collections of node structures with pointers to other nodes. Good representation when nodes have limited degree. Adjacency matrix: A ij = 1 if (v i,v j ) is an edge, 0 otherwise. Weighted graph: A ij = weight(v i,v j ) or  if no edge. Good representation for dense graphs (many edges) Adjacency lists: n lists, one for each node, containing (representations of) the adjacent nodes. Good representation for sparse graphs. Adjacency list representations save the storage of zeroes that adjacency matrices have, but adjacency matrices have simpler and faster random access mechanisms.

Traversing a Binary Tree Pre order – visit the node – go left – go right In order – go left – visit the node – go right Post order – go left – go right – visit the node Level order / breadth first – for d = 0 to height visit nodes at level d A B D G C E H F I

Traversal Examples A B D G C E H F I Pre order A B D G H C E F I In order G D H B A E C F I Post order G H D B E I F C A Level order A B C D E F G H I

Traversal Implementation recursive implementation of preorder – The steps: visit node preorder(left child) preorder(right child) – What changes need to be made for in-order, post-order? How would you implement level order?

Traversal Implementation recursive implementation of preorder – The steps: visit node preorder(left child) preorder(right child) – What changes need to be made for in-order, post-order? How would you implement level order? – Queue

Graph Traversal What makes it different from tree traversals?

Graph Traversal What makes it different from tree traversals: – graph has no root – you can visit the same node more than once – you can get in a cycle What to do about it?

Graph Traversal

BFS: Breadth First Search Like level traversal in trees, BFS(G,s) explores the edges of G and locates every node v reachable from s in a level order using a queue.

BFS: Breadth First Search Like level traversal in trees, BFS(G,s) explores the edges of G and locates every node v reachable from s in a level order using a queue. BFS also computes the distance: number of edges from s to all these nodes, and the shortest path (minimal #edges) from s to v.

BFS: Breadth First Search Like level traversal in trees, BFS(G,s) explores the edges of G and locates every node v reachable from s in a level order using a queue. BFS also computes the distance: number of edges from s to all these nodes, and the shortest path (minimal #edges) from s to v. BFS expands a frontier of discovered but not yet visited nodes. Nodes are colored white, grey or black. They start out undiscovered or white.

BFS(G,s) forall v in V-s {c[v]=white; d[v]= ,p[v]=nil} c[s]=grey; d[s]=0; p[s]=nil; Q=empty; enque(Q,s); while (Q != empty) { u = deque(Q); forall v in adj(u) if (c[v]==white) { c[v]=grey; d[v]=d[u]+1;p[v]=u; enque(Q,v) } c[u]=black; }

Complexity BFS Each node is painted white once, and is enqueued and dequeued at most once.

Complexity BFS Each node is painted white once, and is enqueued and dequeued at most once. Why?

Complexity BFS Each node is painted white once, and is enqueued and dequeued at most once. Enque and deque take constant time. The adjacency list of each node is scanned only once: when it is dequeued.

Complexity BFS Each node is painted white once, and is enqueued and dequeued at most once. Enque and deque take constant time. The adjacency list of each node is scanned only once, when it is dequeued. Therefore time complexity for BFS is O(|V|+|E|).

Shortest Path lengths The shortest path length from s to v is the minimum number of edges on any path from s to v and is  if there is no path. It can be shown (Cormen et. al.) that BFS computes the shortest path in terms of number of edges from s to any v.

Breadth First Tree After BFS has run, the p links in the graph form a predecessor sub-graph of G. We can create a breadth first tree from s to all its reachable nodes by inverting the p links. Using p we can print the vertices on the path from s to v.

Print Path printPath(G,s,v){ if (v==s) print s else if (p(v)==nil) print “no path from s to v” else {printPath(G,s,p(v)); print v} }

DFS: Depth First Search In depth first search edges are explored out of the most recently discovered node until a node is reached that has no more edges to explore; then the search backtracks to an earlier node.

DFS: Depth First Search In depth first search edges are explored out of the most recently discovered node until a node is reached that has no more edges to explore; then the search backtracks to an earlier node. Often in depth first search, if any undiscovered nodes exist after exhaustively searching from one root node v, a next root node is selected until all nodes have been visited. Then we have a forest of disjoint depth first trees.

DFS: Depth First Search In depth first search edges are explored out of the most recently discovered node until a node is reached that has no more edges to explore; then the search backtracks to an earlier node. Often in depth first search, if any undiscovered nodes exist after exhaustively searching from one root node v, a next root node is selected until all nodes have been visited. Then we have a forest of disjoint depth first trees. Is the forest unique?

DFS node coloring

Two time stamps d and f in DFS First time stamp d[v] for node v when it is discovered and colored grey. Second time stamp f[v] when all its adjacent nodes have been visited and it is left and colored black. Timestamps range from 1 to 2|V| forall u: d[u] { "@context": "http://schema.org", "@type": "ImageObject", "contentUrl": "http://images.slideplayer.com/14/4319229/slides/slide_27.jpg", "name": "Two time stamps d and f in DFS First time stamp d[v] for node v when it is discovered and colored grey.", "description": "Second time stamp f[v] when all its adjacent nodes have been visited and it is left and colored black. Timestamps range from 1 to 2|V| forall u: d[u]

DFS(G) forall u in V {c[u]=white; p[u]=nil} time=0 forall u in V if (c[u]==white) DFS-visit(u) DFS-visit(u) c[u]=grey; time++; d[u]=time forall v in Adj(u) if (c[v]==white) {p[v]=u; DFS-visit(v)} c[u]=black; time++; f[u]=time

Complexity DFS creates the roots of the depth first trees and the DFS-visits create the rests. There are many different visiting orders and hence possible timestamps and depth first trees. (Many different forests) DFS takes O(V) The DFS-visits take O(V+E) as they run through all adjacency lists once.

d and f parenthesis structure The d and f timestamps form a parenthesis structure If we represent d[u] as “(u” and f[u] as “u)”,then the history of discoveries forms a well formed (properly nested) expression

AB S D C U T parenthesis structure Make time stamps Draw a DFS forest and its parenthesization (is that a word?)

A 3,6 B S 8,9 D 4,5 C U T 2,7 1,10 11, 14 11, 14 12, 13 12, 13 parenthesis structure possible time stamps

A 3,6 B S 8,9 D 4,5 C U T 2,7 1,10 11, 14 11, 14 12, 13 12, 13 parenthesis structure DFS forest S T / \ / A D U / B / C

A 3,6 B S 8,9 D 4,5 C U T 2,7 1,10 11, 14 11, 14 12, 13 12, 13 parenthesis structure DFS forest S T / \ / A D U / B / C parenthesization (S(A(B(C C)B)A)(D D)S) (T(U U)T)

Time stamp intervals (S(A(B(C C)B)A)(D D)S) (T(U U)T) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Because of the depth first tree / parenthesis structure, intervals d[u], f[u] and d[v],f[v] either are entirely disjoint (T and S, B and D) or one interval is entirely contained in the other (B in A, A in S, U in T) S T / \ / A D U / B / C

Time stamp intervals (S(A(B(C C)B)A)(D D)S) (T( U U)T) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 node y is a descendant of node x in the DFS forest if and only if d[x] < d[y] < f[y] < f[x]. S T / \ / A D U / B / C

White path theorem When x is discovered at time d[x] none of the descendants of x in the DFS tree have been discovered yet, and therefore there is a path of undiscovered (white) nodes from x to descendant y. A 3,6 B S 8,9 D 4,5 C U T 2,7 1,10 11, 14 11, 14 12, 13 12, 13 S T / \ / A D U / B / C parenthesization (S(A(B(C C)B)A)(D D)S) (T(U U)T)

DAGs describe precedence Examples: – prerequisites for a set of courses – dependencies between programs – job schedules Edge from a to b indicates a should come before b

DAGs Describing Precedence Batman images are from the book “Introduction to bioinformatics algorithms”

No edges point backwards

Topological Sorting of DAGs Topological sort: listing of nodes such that if (a,b) is an edge, a appears before b in the list Is a topological sort unique?

Topological sort example Give all possible DFS time stamps: S S U U V V

Topological sort example S S U U V V S S U U V V S S U U V V S S U U V V S S U U V V 1,6 2,5 3,4 4,5 2,3 1,6 1,4 2,3 5,6 1,2 3,4 5,6 1,2 3,6 4,5 For all these time orderings F v < F U < F S

Topological Sort A DAG (directed acyclic graph) can be topologically sorted, ie linearly ordered so that predecessors come before successors, using depth first search. Cormen et. al. show that for any edge (u,v) in a DAG finishing timestamp f(v) < f(u). Check previous example. So when v is finished, put it in front of the list.

Conclusion: Topological sort Run DFS As each node is finished insert it in front of a linked list of nodes

Strongly Connected Components A directed graph can be partitioned in strongly connected components: maximal sub-graphs C where for every u and v in C there is a path from u to v and there is a path from v to u. In a strongly connected component (SCC) all nodes can reach each other.

AB S C T SCCs ?

AB S C T SCCs: {A} {B} {C} {S,T}

SCC of G T Given graph G=(V,E) define graph G T =(V,E T ), where E T is the set of reversed edges of E. G and G T have exactly the same SCCs ! The path that went from x to y now goes from y to x, and vice versa.

AB S C T SCCs ?

AB S C T {A,B,C} {S,T} Can we use DFS to find the SCCs?

A 4,7 B S C T 3,8 1,10 5,6 2,9 DFS finds either one DFS tree when it starts at T: (T(S(A(B(C C)B)A)S)T)

A 2,5 B S C T 1,6 8,9 3,4 7,10 or two, eg when it starts at A: (A(B(C C)B)A) and (T(S S)T) here we were lucky and found the SCCs

AB S C T Here is G T

A 7,8 B S C T 1,10 3,4 6,9 2,5 DFS of G T also gives us either one depth first tree, eg when starting at A: (A(S(T T)S)(C(B B)C)A)

A 7,8 B S C T 5,10 1,4 6,9 2,3 or two, eg when starting at T: (T(S S)T) and (A(C(B B)C)A)

SCC Algorithm SCC(G) 1. Call DFS(G) to compute finishing times f[u] 2. Call DFS(G T ) but in the DFS loop consider the nodes in decreasing f[u] order 3. Output each DFS tree from 2 as a strongly connected component.

SCC Algorithm SCC(G) 1. Call DFS(G) to compute finishing times f[u] 2. Call DFS(G T ) but in the DFS loop consider the nodes in decreasing f[u] order 3. Output each DFS tree from 2 as a strongly connected component. try it on the example

A 4,7 B S C T 3,8 1,10 5,6 2,9 AB S C T GTGT G T

A 4,7 B S C T 3,8 1,10 5,6 2,9 AB S C T (T(SS)T) (A(B(CC)B)A) G GTGT T

A 2,5 B S C T 1,6 8,9 3,4 7,10 G AB S C T GTGT

A 2,5 B S C T 1,6 8,9 3,4 7,10 G AB S C T GTGT (S(TT)S)) (A(B(CC)B)A)

Proof of SCC algorithm The component graph G C =(V C,E C ) has as nodes the strongly connected components, and an edge from component X to Y if there is and edge from a node in X to a node in Y. Our example graph G above has a component graph with two nodes and one edge (S,T)  (A,B,C)

Component graph is a DAG The component graph is a DAG because if it would have a cycle the components in the cycle would form one big component, which creates a contradiction.

Timestamp of a component The timestamp d[U] of component U is the earliest discovery time of any node in U and timestamp f[U] is the latest finish time of any node in U.

Earlier timestamp observations node y is a descendant of node x in the DFS forest if and only if d[x] < d[y] < f[y] < f[x]. When x is discovered at time d[x] none of the descendants of x in the DFS tree have been discovered yet, and therefore there is a path of undiscovered white nodes from x to descendant y.

Lemma Let X and Y be two strongly connected components and there is an edge (X,Y) in the component graph. Then f(X) > f(Y). Check it for our example.

Proof of Lemma case 1: d[X] f(Y). Proof case 1: x is first node discovered in X, d[x]=d[X] and all nodes in X and Y are white and there is an edge from X to Y so there is a white path from x to any node in Y. Therefore f[X]=f[x]>f[Y]

Proof of Lemma case 2: d[X]>d[Y]. Lemma: Let X and Y be two strongly connected components and there is an edge (X,Y) in the component graph. Then f(X) > f(Y). Proof case 2: There cannot be an edge from any node in Y to any node in X, otherwise the two components would be one. Therefore all nodes in Y get discovered and finished before any nodes in X, hence f[X]>f[Y].

Corollary If there is an edge from strongly connected component Y to X in G T, then there is an edge from X to Y in G and thus f[X]>f[Y]. Also there is no edge from X to Y in G T. In our example: in G T (A,B,C)  (S,T) therefore f((S,T)) > f((A,B,C)) in G

Inductive Proof for SCC BASE: In step 2 of the SCC algorithm, DFS visits the strongly connected components C in G T in reverse f[C] order. By the corollary above there is no edge from C to any other component and hence all nodes in C will be visited first and will make up the first DFS tree of C.

our example: (S,T) first component AB S C T GTGT A 2,5 B S C T 1,6 8,9 3,4 7,10 DFS1 G A 4,7 B S C T 3,8 1,10 5,6 2,9 DFS2 G T

Inductive Proof for SCC STEP: Then a next component C next is picked with the latest finishing time f[C next ]. Again there is no edge from C next to any of the remaining components and the nodes in C next will make up a next tree in the DFS. Etc.