Trevor Brown DC 2338, Office hour M3-4pm

Trevor Brown trevor.brown@uwaterloo.ca DC 2338, Office hour M3-4pm
CS 341: Algorithms Trevor Brown DC 2338, Office hour M3-4pm

This time Strong connectedness Algorithm using DFS
Strongly connected components Sharir’s algorithm (using DFS)

Strong connectedness Testing existence of all-to-all paths

Compare: we use 𝑤→𝑣 to denote an edge from 𝑤 to 𝑣
Strong connectedness In a directed graph, 𝒗 is reachable from 𝒘 if there is a path from 𝑤 to 𝑣 we denote this path 𝑤→→𝑣 A graph G is strongly connected iff every node is reachable from every other node More formally: ∀ 𝑤,𝑣 𝑤→→𝑣 𝒘 … 𝒗 Compare: we use 𝑤→𝑣 to denote an edge from 𝑤 to 𝑣

No path from c to other nodes.
Strong connectedness Is this graph strongly connected? How about this one? No path from c to other nodes. a b c d Yes. One big cycle. a b c d

Strong connectedness How about this graph? How about this one? c f a b
Yes. Multiple intersecting cycles. a b g d e No. Two cycles with only a one-directional path between them. f a b c d e g

Applications of checking strong connectedness
You gain some symmetry from knowing a graph is strongly connected For example, you can start a graph traversal at any node, and know the traversal will reach every node Without strong connectedness, if you want to run a graph traversal that reaches every node in a single pass, you would have to do additional processing to determine an appropriate starting node

Applications of checking strong connectedness
Useful as a sanity check! Suppose you want to run an algorithm that requires strong connectedness, and you believe your input graph is strongly connected Validate your input by testing whether this is true! Subtle, difficult-to-detect bugs often result if such an algorithm is run only on one component of a graph [More concrete applications once we generalize and talk about strongly connected components…]

Strong connectedness Lemma: a graph is strongly connected
Prove both directions: (⇒) Suppose for all 𝑢,𝑣 we have 𝑢→→𝑣. Fix any 𝑠. Node 𝑠 is reachable from all nodes, and vice versa. Lemma: a graph is strongly connected iff for every node 𝑠, all nodes are reachable from 𝑠, and 𝒔 is reachable from all nodes (⇐) Suppose 𝑠 is reachable from all nodes and vice versa. For any 𝑢,𝑣, we have 𝑢→→𝑠→→𝑣, and 𝑣→→𝑠→→𝑢. 𝒔 𝒔

Strong connectedness DFS from 𝑠 and see if every node turns black How to use DFS to determine whether every node is reachable from a given node 𝑠? How to use DFS to determine whether s is reachable from every node? What if we first reverse the direction of every edge? Then 𝑠→→𝑣 in this new graph IFF 𝑣→→𝑠 in the original graph DFS from 𝑠 𝒔 𝒔

𝐼𝑠𝑆𝑡𝑟𝑜𝑛𝑔𝑙𝑦𝐶𝑜𝑛𝑛𝑒𝑐𝑡𝑒𝑑(𝐺={𝑉,𝐸}) where 𝑉= 𝑣 1 , 𝑣 2 ,…, 𝑣 𝑛
(𝑐𝑜𝑙𝑜𝑢𝑟,𝑑,𝑓)≔𝐷𝐹𝑆𝑉𝑖𝑠𝑖𝑡( 𝑣 1 ,𝐺) for 𝑖≔1..𝑛 if 𝑐𝑜𝑙𝑜𝑢𝑟 𝑣 𝑖 ≠𝑏𝑙𝑎𝑐𝑘 then return 𝑓𝑎𝑙𝑠𝑒 Construct graph 𝐻 by reversing all edges in 𝐺 (𝑐𝑜𝑙𝑜𝑢𝑟,𝑑,𝑓)≔𝐷𝐹𝑆𝑉𝑖𝑠𝑖𝑡( 𝑣 1 ,𝑯) return 𝑡𝑟𝑢𝑒 How?

Reversing edges: adjacency matrix
target a b c d e f g a b c d e f g 1 f a b c source d e g reverse all edges a b c d e f g a b c d e f g f a b c d e g

target a b c d e f g a b c d e f g 1 f a b c source d e g reverse all edges a b c d e f g a b c d e f g 1 f a b c d e g

target a b c d e f g a b c d e f g 1 Can do matrix transpose, or can just swap variables for source & target in your code! f a b c 𝑀 𝐸 source d e g Complexity? reverse all edges a b c d e f g a b c d e f g 1 f 𝑀 𝐸 𝑇 a b c d e g

Reversing edges: adjacency Lists
target a b c d e f g d c f c a e a b source b f d e g g e Complexity? reverse edges a b c d e f g c d b f a a b c c g e d e f g

Complexity for adjacency lists?
𝐼𝑠𝑆𝑡𝑟𝑜𝑛𝑔𝑙𝑦𝐶𝑜𝑛𝑛𝑒𝑐𝑡𝑒𝑑(𝐺={𝑉,𝐸}) where 𝑉= 𝑣 1 , 𝑣 2 ,…, 𝑣 𝑛 (𝑐𝑜𝑙𝑜𝑢𝑟,𝑑,𝑓)≔𝐷𝐹𝑆𝑉𝑖𝑠𝑖𝑡( 𝑣 1 ,𝐺) for 𝑖≔1..𝑛 if 𝑐𝑜𝑙𝑜𝑢𝑟 𝑣 𝑖 ≠𝑏𝑙𝑎𝑐𝑘 then return 𝑓𝑎𝑙𝑠𝑒 Construct graph 𝐻 by reversing all edges in 𝐺 (𝑐𝑜𝑙𝑜𝑢𝑟,𝑑,𝑓)≔𝐷𝐹𝑆𝑉𝑖𝑠𝑖𝑡( 𝑣 1 ,𝑯) return 𝑡𝑟𝑢𝑒 𝑂(𝑛+𝑚) Complexity for adjacency lists?

Example execution 1 f f f a b b b c a a c c d d d e e e g g g
𝐷𝐹𝑆𝑉𝑖𝑠𝑖𝑡(𝑎) in 𝐺 (𝑎 is arbitrary) f f f a b b b c a a c c Every node is black. Next step! d d d e e e g g g

Example execution 1 construct graph 𝑯 f f f a a a b b b c c c d d d e
𝐷𝐹𝑆𝑉𝑖𝑠𝑖𝑡(𝑎) in 𝐺 (𝑎 is arbitrary) f f f a a a b b b c c c Every node is black. Next step! d d d e e e g g g construct graph 𝑯 𝐷𝐹𝑆𝑉𝑖𝑠𝑖𝑡(𝑎) in 𝐻 f f f a a a b b b c c c Every node is black. So 𝐺 is strongly connected! d d d e e e g g g

Example execution 2 construct graph 𝑯 f f f a c a a b b b c c d d d e
𝐷𝐹𝑆𝑉𝑖𝑠𝑖𝑡(𝑎) in 𝐺 (𝑎 is arbitrary) f f f a c a a b b b c c Every node is black. Next step! d d d e e e g g g Could the result change if we started at a different node? construct graph 𝑯 𝐷𝐹𝑆𝑉𝑖𝑠𝑖𝑡(𝑎) in 𝐻 f a a a b b b c c c Some nodes are not black d d d e No path from those nodes to 𝑎 g So 𝐺 is not strongly connected!

Strongly Connected Components (SCC)
Graphs that are not strongly connected can be divided into components that are

Strongly connected components
This graph could be divided into two graphs that are each strongly connected f a b c h d i e g These are called strongly connected components (SCCs)

Not maximal, so not SCC It could also be divided into three graphs… But we want our SCCs to be maximal (as large as possible) f a b c h d e i g Maximal, so SCC Not maximal, so not SCC

So, the goal is to find these (maximal) SCCs: f a b c h d i e g

Formal definitions Note: a connected component can contain just a single node Example: a node with no out-edges

Component graph l j k f b c h a d i e g Consider this graph
These are its SCCs j k f a b c h d e i g And an edge between two nodes IFF there is an edge between the corresponding SCCs It has one node for each SCC The following is its component graph i,j,k a,b,c,d h,i f,e,g

Applications of SCCs and component graphs
Finding all cyclic dependencies in code Can find single cycle with an earlier DFS- based algorithm But it is nicer to find all cycles at once, so you don’t have to fix one to expose another

Finding SCCs in a social network graph Yields information about communities that have formed Social Networks can study the evolution of those communities

Explicit model checking in formal verification In model checking, we have a state machine, which represents the model of our soft-/hardware, and we try to prove that temporal logic formulas hold over it Idea: for a set of program states, prove bad things cannot happen in those states Example: CTL formula EG(p) can be verified by finding SCCs & checking paths to them

Studying connectome graphs These are graphs used for modelling nervous systems In such graphs vertices correspond to cells and edges correspond to physical cell contacts or su SCCs seem to have an important role in brain connectome graphs For example, a 2016 paper found that a fly brain connectome had one large SCC w/785 nodes (neurons), and other SCCs had only 1-2 nodes Similar structure in rat and cat brain samples. Motivates studying why!

Data filtering before running other algorithms Consider Google maps; Nodes = intersections, edges = roads Don’t want to run path finding alg. on entire global graph! First restrict execution to a rectangle Then throw away everything except the strongly connected component that contains the source & target Crop & find largest SCC

Solving 2-satisfiability Conjunctive normal form Boolean formulae with constraints on pairs of variables, e.g., 𝑓 𝑥 1 , 𝑥 2 , 𝑥 3 = 𝑥 1 ∨ 𝑥 2 ∧ ¬ 𝑥 2 ∨ 𝑥 3 ∧( 𝑥 3 ∨¬ 𝑥 1 ) Problem: is there an assignment of ( x 1 , x 2 , x 3 ) that makes this formula true? 2-satisfiability can solve many problems! Arranging text labels (or other objects) in a diagram to avoid overlap, if each has two possible positions Routing wires that can only bend once in a VLSI integrated circuit design

So the problem is important…
How do we solve it?

Brainstorming an algorithm
What if we run DFS, then reverse all edges, then run DFS (like checking whether an entire graph is strongly connected?) l l,23 l 𝐷𝐹𝑆𝑉𝑖𝑠𝑖𝑡(𝑎) 𝐷𝐹𝑆𝑉𝑖𝑠𝑖𝑡(ℎ) 𝐷𝐹𝑆𝑉𝑖𝑠𝑖𝑡(𝑗) f f f,9 j,24 j j b,12 b b c,11 c c h,18 h h k k k,22 a a a,14 d d,13 d i i,17 i e e e,10 g,8 g g Showing finish times

What if we run DFS, then reverse all edges, then run DFS?
𝐷𝐹𝑆𝑉𝑖𝑠𝑖𝑡(𝑎) 𝐷𝐹𝑆𝑉𝑖𝑠𝑖𝑡(ℎ) 𝐷𝐹𝑆𝑉𝑖𝑠𝑖𝑡(𝑗) f f f,9 j j j,24 b,12 b b c c,11 c h,18 h h k k k,22 a,14 a a d,13 d d i i,17 i e e e,10 We did visit everything we saw in our 𝐷𝐹𝑆𝑉𝑖𝑠𝑖𝑡 ℎ above! But we also saw more nodes: j, k, l g,8 g g 𝐷𝐹𝑆𝑉𝑖𝑠𝑖𝑡(𝑎) 𝐷𝐹𝑆𝑉𝑖𝑠𝑖𝑡(𝑒) reverse edges l l l Not clear how to identify remaining SCCs… f f f j j j c k k k a a a b b b c c h h h d d d i i e e e i What if we perform our DFSVisit calls in a different order? g g g

What if we run DFS, then reverse all edges, then run DFS?
𝐷𝐹𝑆𝑉𝑖𝑠𝑖𝑡(𝑎) 𝐷𝐹𝑆𝑉𝑖𝑠𝑖𝑡(ℎ) 𝐷𝐹𝑆𝑉𝑖𝑠𝑖𝑡(𝑗) f f f,9 j j j,24 k,22 a,14 a a b,12 b b c c,11 c h,18 h h k k Problem: from 𝑒, we can reach other SCCs d,13 d d i i i,17 e e e,10 g g,8 g Idea: perform same DFSVisit calls, but in reverse order 𝐷𝐹𝑆𝑉𝑖𝑠𝑖𝑡(𝑗) 𝐷𝐹𝑆𝑉𝑖𝑠𝑖𝑡(ℎ) 𝐷𝐹𝑆𝑉𝑖𝑠𝑖𝑡(𝑎) reverse edges l l l f j j j c a a b b b c c a h h h k k k Identified three SCCs correctly, but clearly need more DFSVisit calls! d d d e i i i g

Identified all SCCs correctly! Why?
What if we run DFS, then reverse all edges, then run DFS? l l l,23 𝐷𝐹𝑆𝑉𝑖𝑠𝑖𝑡(𝑎) 𝐷𝐹𝑆𝑉𝑖𝑠𝑖𝑡(ℎ) 𝐷𝐹𝑆𝑉𝑖𝑠𝑖𝑡(𝑗) f f,9 f j,24 j j b,12 b b c,11 c c h,18 h h k,22 k k a,14 a a d,13 d d i i,17 i e e e,10 g g g,8 Idea: call DFSVisit on all nodes in order of largest to smallest finishing time 𝐷𝐹𝑆𝑉𝑖𝑠𝑖𝑡(𝑗) 𝐷𝐹𝑆𝑉𝑖𝑠𝑖𝑡(ℎ) 𝐷𝐹𝑆𝑉𝑖𝑠𝑖𝑡(𝑎) 𝐷𝐹𝑆𝑉𝑖𝑠𝑖𝑡(𝑒) reverse edges l l l f f f j j j a a a b b b c c c h h h k k k d d d e i i e e i g g g Identified all SCCs correctly! Why?

reasoning about why this works
By the DFS-based topological sort algorithm we saw earlier We DFSVisit nodes by largest to smallest finishing time Recall this would be a topological order in a DAG But there may be cycles in each component! We prove the component graph is a DAG And we prove our DFS visits the component graph in a topological order So, the idea is: we perform DFS on SCCs one-by-one, in a topological order, which guarantees that any other SCCs reachable from the current DFS must already be finished/black (so won’t be visited) So, can identify precisely what is in this SCC So, we only visit nodes in the current SCC!

Fact: Component graph is a DAG
a,b,c,d i,j,k Proof by contradiction Suppose there is a cycle 𝑣 1 , 𝑣 2 ,…, 𝑣 𝑘 Consider any pair of nodes 𝑢,𝑣 in any of the corresponding components Suppose 𝑢 and 𝑣 are located in different components 𝐶 𝑢 ≠ 𝐶 𝑣 Then one can navigate from 𝑢 to 𝑣 by moving from 𝐶 𝑢 to 𝐶 𝑣 in the component graph (moving freely within each component) But then 𝐶 𝑢 and 𝐶 𝑣 are part of the same SCC---a contradiction! 𝒖 in here h,i f,e,g 𝒗 in here

𝒖 = earliest discovered node in here
Proving decreasing finish times induce a topological order on the component graph Definition: For a strongly connected component 𝐶, let 𝒅 𝑪 =min⁡{𝑑 𝑣 :𝑣∈𝐶} and 𝒇 𝑪 =max⁡{𝑓 𝑣 :𝑣∈𝐶} Lemma: if 𝐶 𝑖 , 𝐶 𝑗 are SCCs and there is an edge 𝐶 𝑖 → 𝐶 𝑗 , then 𝑓 𝐶 𝑖 >𝑓[ 𝐶 𝑗 ] Proof. Case 1 (𝑑 𝐶 𝑖 <𝑑[ 𝐶 𝑗 ]): Let 𝑢 be the earliest discovered node in 𝐶 𝑖 All nodes in 𝐶 𝑖 ∪ 𝐶 𝑗 are white-reachable from 𝑢, so they are descendants in the DFS forest and finish before 𝒖 So 𝑓 𝐶 𝑖 >𝑓[ 𝐶 𝑗 ] 𝑪 𝒊 𝑪 𝒋 𝒖 = earliest discovered node in here

Proving decreasing finish times induce a topological order on the component graph
Definition: For a strongly connected component 𝐶, let 𝒅 𝑪 =min⁡{𝑑 𝑣 :𝑣∈𝐶} and 𝒇 𝑪 =max⁡{𝑓 𝑣 :𝑣∈𝐶} Lemma: if 𝐶 𝑖 , 𝐶 𝑗 are SCCs and there is an edge 𝐶 𝑖 → 𝐶 𝑗 , then 𝑓 𝐶 𝑖 >𝑓[ 𝐶 𝑗 ] Proof. Case 2 (𝑑 𝐶 𝑖 >𝑑[ 𝐶 𝑗 ]): Since component graph is a DAG, there is no edge 𝑪 𝒋 → 𝑪 𝒊 Thus, no nodes in 𝐶 𝑖 are reachable from 𝐶 𝑗 So we discover 𝐶 𝑗 and finish 𝐶 𝑗 without discovering 𝐶 𝑖 Therefore 𝑑[ 𝐶 𝑗 ]<𝒇[ 𝑪 𝒋 ]<𝑑 𝐶 𝑖 <𝒇 𝑪 𝒊 . QED 𝑪 𝒊 𝑪 𝒋

Using the lemma to build an algorithm
Lemma: if 𝐶 𝑖 , 𝐶 𝑗 are SCCs and there is an edge 𝐶 𝑖 → 𝐶 𝑗 , then 𝑓 𝐶 𝑖 >𝑓[ 𝐶 𝑗 ] Algorithm: ( 𝑣 𝑖 1 , 𝑣 𝑖 2 ,…, 𝑣 𝑖 𝑛 ):=𝐷𝐹𝑆_𝑡𝑜𝑝𝑠𝑜𝑟𝑡 (𝐺) 𝐻 := construct by reversing each edge in 𝐺 return≔ 𝐷𝐹𝑆_𝑆𝐶𝐶(𝐻,( 𝑣 𝑖 1 , 𝑣 𝑖 2 ,…, 𝑣 𝑖 𝑛 )) Topological sort algorithm we saw earlier. Returns nodes ordered by finish time from largest to smallest. This is called Sharir’s algorithm (sometimes Kosaraju’s algorithm). This paper first introduced it. Calls DFSVisit on nodes in topological order, and gives each node an SCC number in a component[] array, which is then returned.

Pseudocode 𝐷𝐹𝑆(𝐻, 𝑣 𝑖 1 , 𝑣 𝑖 2 ,…, 𝑣 𝑖 𝑛 ) Complexity? 𝑂(𝑛+𝑚)

Trevor Brown DC 2338, Office hour M3-4pm

Similar presentations

Presentation on theme: "Trevor Brown DC 2338, Office hour M3-4pm"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Trevor Brown DC 2338, Office hour M3-4pm

Similar presentations

Presentation on theme: "Trevor Brown DC 2338, Office hour M3-4pm"— Presentation transcript:

Similar presentations

About project

Feedback