# Graphs and Finding your way in the wilderness

## Presentation on theme: "Graphs and Finding your way in the wilderness"— Presentation transcript:

Graphs and Finding your way in the wilderness
Chapter 14 in DS&PS Chapter 9 in DS&AA

General Problems What is the shortest path from A to B?
What is the shortest path from A to all nodes? What is the shortest/cheapest path between any two nodes?. Search for Goal node, i.e. a node with specific properties, like a win in chess. What is shortest tour? (visit all vertices) no known polynomial algorithm in number of edges What is longest path from A to B

Some Applications Route Finding
metacrawler, on net, claims to find shortest path between two points Game-Playing great increase in chess/checkers end-game play occurred when recognized as graph search, not tree search Critical Path Analysis multiperson/task job schedule analysis answers what are the key tasks that can’t slip Travel arrangement cheapest cost to meet constraints

Definitions Graph: set of edges E and vertices V.
Edge is a pair of vertices (v,w) edge may be directed or undirected edge may have a cost v and w are said to be adjacent Digraph or directed graph: directed edges Path: sequence of vertices v1,…vn where each <vi,vi+1> is an edge. Cycle: path where v1=vn Tour: cycle that contains every vertex DAG: directed acyclic graph graph with no cycles Tree algorithms usually work with DAGs just fine

Matrix A = new Boolean(|V|,|V|). O(|V|^2) memory costs: acceptable only if dense If <vi,vj> is an edge, set A[i][j] = true, else false Special matrix multiple operator: col[j] = row[i][1]&col[1][j] or row[i][2]&col[2][j]….. In if entry [i][j] is true, what does that mean? There is some k so that vi->vk and vk->vj or a length 2 path from vi to vk. Similarly, A^k indicates where any two vertices are connected by a length k path. Cost: O(k*n^3).

Matrix Review If A has n rows and k columns and
B has k rows and m columns then A*B = C Where C has n rows and m columns and (standard) C[i][j] = A[i][1]*B[1][j]+….+A[i][k]*B[k][j] or i-j entry of C is dot product of row i of A and column J of B. Total time cost is O(n*k*m). Example: matrix of size 3-3 represents a linear transformation from R^3 into R^3. Points in R^3 represented as a 3 by 1 vector (column) Essentially matrices stretch/shrink or rotate points. Determinant defines amount of stretch/shrink. Theorem: Locally every differentiable function can be approximated by a matrix (linear transformation).

Cost Matrix Representation
Now A[i][j] = cost of edge from vi to vj. If no edge, either set cost to Infinity or add Boolean attribute to indicate no edge. New multiplication operation = min { row[i][1]+col[1][j], row[i][2]+col[2][j],… row[i][n] +col[n][j] } Now A^2 contains minimum cost path of length 2 between any 2 vertices. A^n has complexity O(n^4). Not good. If add A[i][i]= 0, then A^k records minimum cost path of length = k. (how to change to allow all paths <= k)

Here each vertex is the head of linked list which stores all the adjacent vertices. The cost to a node, if appropriate is also stored. The linked lists are usually stored in an array, since you probably know how many vertices there are. Memory cost = O(|E|) so if |E| << |V|^2, use list representation. Graph is sparse if |E| is O(|V|). To simplify the discussion, we will assume that each vertices has a method *sons* which returns all adjacent vertices. Now, will redo same problems with list representation.

General Graph Search Algorithm Looking for a node with a property
Set Store equal to the some node while ( Store is non-empty) do choose a node n in Store if n is solution, stop Decisions: else add SOME sons of n to store What should store be? How do we choose a node? what does add mean? How do pick which sons to store. Cycles are a problem

Problem: Is Graph connected? (matrix rep)
Note: n by n boolean matrix where n is number of vertices. Set A[i][i] to true Set A[i][j] to true if there is an edge between i and j. Let B= A^2, using boolean arithmetic Note B[i][j] is true iff there is a k such that B[i][k] is true and B[k][j] is true, i.e if there is a 2-path from i to j. A^k represents whether a k-path exists between any vertices. Let C = boolean sum of A^i where i= 1…n-1. (why?) Graph connected if C is all ones. Time complexity: O(N^4)! How about directed graphs? Basically the same algorithm. For directed graphs, strongly connected means directed path between any two vertices.

Is Undirected Graph G Connected? (adjacency list representation)
Suppose G is an undirected graph with N vertices. Let S be any node Do a (depth/breadth) first search of G, counting the number of nodes. Be careful not to double count. If number of nodes does not equal N, disconnected. Searching a Graph is like searching a tree, except that nodes may be revisited. Need to keep track of revisits, else infinite loop.

Depth First Search Pseudo-Code
Store = Stack Choose = pop Initial node: any node Add = push only new sons (unvisited ones) keep a boolean field visited, initialized to false. When a node is “popped”, mark it as visitied. Graph connected if all nodes visited. Properties Memory cost: number of nodes Guarantee to find a solution, if one exists (not shortest solution) How could we guarantee that? Time: number of nodes (exponential for k-ary trees)

Breadth First Search G is a undirected Graph
As before each node has a boolean visited field. Initial node is arbitrary Store = Queue Choose = dequeue and mark as visited Add = enqueue only those sons that have not been visited Properties: Time: Number of nodes to solution Space: Number of nodes Guaranteed to find shortest solution

Is Directed Graph Acyclic? (array represntation)
Let A[i][j] be true if there is a directed edge from i to j. Similar to previous case, if B= A^2 with boolean multiplication, then B[i][j] is true iff there is a directed 2-path from i to j. Algorithm: For i = 1 to n-1 (why?) Compute A^i. If some diagonal element is true, exit with true end for Exit with false.

Is Directed Graph Acyclic? (adjacency list rep)
With care, breadth first search works. Define the indegree of a node v as the number of edges of the form (u,v), with u arbitrary. Define the outdegree of a node v as the number of edges of the form (v, u) with u arbitrary. A node with indegree 0 is like the root of a tree. A node with outdegree 0 is a terminal node.

Algorithm Idea (has numerous variations/implementations) Store = Queue Compute indegree of all nodes Enqueue all nodes of indegree 0 While Queue is not empty Dequeue node n and lower indegrees of nodes of form (n,v) Enqueue any node whose indegree is 0. If any node still has positive indegree, then cyclic. Why does algorithm terminate? Properties: Time & Space: Number of nodes find node closest to “roots”.

Best-First Search Goal: find least cost solution
Here edges have a cost (positive) Store = priority queue Add = enqueue(), which puts in right order Choose = dequeue(), chooses element of least cost Properties: Find cheapest solution Time and Memory: exponential in… depth of tree.

Best First Pseudo-Code
Set distance from S to S to 0 Priority Queue PQ <- S while (PQ is not empty) vertex <- PQ.deque() sons <- vertex.sons() for each son in sons PQ.enqueue(son, cost to son)

Topological Sort Given: a directed acyclic graph
Produce: a linear ordering of the vertices such that if a path exist from v1 to v2, then v1 is before v2. If v1 is before v2, is there a path from v1 to v2? NO Note: there may be multiple correct topological sorts Algorithm Idea: any vertex with indegree 0 can be first Output and delete that vertex update indegree’s of its sons Repeat until empty So we need to compute and keep track of indegree’s

Algorithm Implementation
HashTable of (vertex, indegree, sons) Queue of vertices Step 1: read each edge (v,w) and add 1 to indegree of w linear Step 2: Add all vertices with indegree 0 to queue Q. Step 3: Process Q by: dequeue vertex update indegrees of its sons (constant by hashing) enqueue any son whose indegree become 0. Time complexity: linear Space: linear Proof: Does everything get enqueued?

UnWeighted Single-Source Shortest path algorihtm
Input: Directed graph and start node S Output: the minimum cost, in terms of number of edges traversed, from start node to all other nodes. Idea: do a level order search (breadth-first search) We’ll use a hashtable to mark elements as “seen”, i.e. we’ll track vertices that we’ve visited use hashtable to hold this information by “marking” vertices that have been visited We’ll use a queue to store the vertices to be “opened” to open a node means to consider its sons Each vertex will have a field for distance to start node.

Shortest Path Algorithm
Set distance from S to S to 0 queue <- S while (queue is not empty) vertex <- queue.dequeue() mark vertex as visited (enter in hashtable) record cost to vertex sons <- vertex.sons() newSons <- sons that are unmarked fill in distance measure to newSons queue.enqueue(newSons) Essentially, breadth-first search

Discussion Will this terminate? Will we ever revisit a node?
Computational cost? O(|E|) What are the memory requirements? Suppose we don’t count graph (virtual graphs) Can we bound queue? Only O(|E|) and if m-ary tree, this is exponential. Did we need the hashtable? This avoids a linear search of the constructed graph remove a factor of O(|G|).

Positive-Weighted Single-source Cheapest Path
Suppose we have positive costs associated with every edge in a directed graph. Problem: Find the shortest path(total cost) from given vertex S to every vertex. Solution: Dijstra’s algorithm BFS idea still works, with slight modifications Replace Queue by Priority queue. As before, replace newSons by betterSons. As before, replace add entry to update entry (which may be add) Note: may reopen an old son( if return with better path) Dense graphs: O(|V|^2), sparse graphs: O(|E|log|V|)

Weighted Shortest Path Pseudo-Code
Set distance from S to S to 0 (on node) Priority Queue PQ <- S while (PQ is not empty) vertex <- PQ.dequeue() … remove min mark vertex as visited (enter in hashtable) record cost to vertex sons <- vertex.sons() goodSons <- new sons OR old sons with better costs estimates queue.enqueue(goodSons)… enqueue puts in proper order.

Graphs with negative edge costs
Dijsktra doesn’t work (since we may have cycles which lower the cost) Input: Directed graph with arbitrary edge costs and vertex v. Output: Minimum cost from S to every vertex OR graph has a negative cost cycle. Note: If no negative cost cycles, then a vertex can be visited (expanded) at most |V| times. Algorithm: Add counter to each vertex so each time it is visited with lower cost, counter goes up. If counter exceeds |V|, then graph has negative cost cycle and we exit. Otherwise queue will be empty.

Weighted Single-Source shortest-path problems for Acyclic graphs
Easy since no cycles Edge costs may be positive or negative Best-first search works Node may be reentrant Reentrant node require may required updating cost. Or apply topological sorting algorithm. (text) 2 4 7 3 6 1

Algorithm Display Idea: Iterative use of breadth first search
addition of edges to effect other choices See Diagrams provided Analysis: (requires augmented path be cheapest) Runs in linear time

Minimum Spanning Tree Given: an undirected connected graph with edge costs Output: a subtree of graph such that contains all vertices sum of costs of edges is minimum If costs not given, assume 1. What then? Note all spanning trees have same number of edges Application: Is undirected Graph with n vertices connected? IFF minimal spanning tree has n-1 edges.

Prim’s Algorithm Let G be given as (V,E) where V has n vertices
Let T = empty Algorithm Idea: grow cheapest tree Choose a random v to start and add to T Repeat (until T has n vertices) select edge of minimum length that does not form a cycle and that attaches to current tree (how to check?) add edge to T The proof is more difficult than the code. Complexity depends on G and code O(V^2) for dense graphs O(E*log(V)) for sparse graphs (use binary heap)

Kruskal’s Algorithm Given graph G = (V,E)
Sort edges on the basis of cost. Add least cost edge to Forest, as long as no cycle is formed. Cost of cycle checking is? If implement as adjacency list, O(E^2) If implement as hash table O(1) Proof more difficult. Time complexity: O(E log E)

Finding the least cost between pairs of points
Idea: Dynamic programming Let c[i][j] be the edge cost between vi and vj. Define C[i][j] as minimum cost for going from vi to vj. Finding the subproblems Suppose P is the path from vi to vj which realizes the minimum cost and vk is an intermediary node. Then the subpaths from i to k and from k to j must be optimal, otherwise P would not be optimal. Now define D[i][k][j] as the minimum cost for going from vi to vj using any of v1,v2..,vk as an intermediary. Define D[i][j] as the minimum cost for going from i to j. D[i][j] = min over k of D[i][k][j] and c[i][j].

All-Pairs (Floyd’s)Pseudo-Code
Initialization: D[i][0][j] =cost(i,j) for all vertices i, j … O(|V|^2) D[i][k+1][j] = min(D[i][k][j], D[i][k][k+1]+D[k+1][k][j]) This last statement is true since any path from the shortest path from vi to vj using {v1,…vk+1} either doesn’t use vk+1, or the path divides into a path from vi to vk+1 and one from vk+1 to vj. The cost of this is O(|V|^3) - i.e. single loop over all vertices with |V|^2 per loop.

NP vs P Multiple ways to define
Define new computational model (Imaginary) add to programming language choose S1, S2,….Sn; where Si are statements Semantics: algorithm always chooses best Si to execute. This is the Non-Deterministic model If problem can be solve in polynomial time with non-deterministic it is in the class NP. If problem can be solved in polynomial time on standard computer (deterministic) then in class P. Unsolved (and possibly unsolvable) does NP = P?

NP-Completeness A problem is in NP or NP-hard if it can be solved in polynomial time on a non-deterministic machine. A problem p* is NP complete if any problem in NP can be polynomial reduced to p*. A problem P1 can be polynomial reduced to P2 if P1 can be solved in polynomially time assuming that P2 can be solved in polynomially time. Alternatively, if P1 can be transformed into P2 and solutions of P2 mapped back to P1 and all the transformations take polynomial time. This is a way of forming a taxonomy of difficulty of various problems

NP-Complete problems Boolean Satisfiability Traveling Salesmen
Bin Packing: given packages of size a[1]…a[n] and bins of size k, what is the fewest numbers of bins needed to store all the packages. Scheduling: Given tasks whose time take t[1]…t[n] and k processors, what is minimum completion time? Graph: Given a graph find the clique of maximum size. A clique is a completely connected subgraph. Subset-sum: Given a finite set S of n numbers and a target number t, does some subset of S sum to t. Vertex Cover: A vertex cover is a subset of vertices which hits every edge. The problem is to find a cover with the fewest number of vertices.