 # CS 336 March 19, 2012 Tandy Warnow.

## Presentation on theme: "CS 336 March 19, 2012 Tandy Warnow."— Presentation transcript:

CS 336 March 19, 2012 Tandy Warnow

Basic Graph Terminology
Nodes, vertices, edges, degrees, paths, cycles, connected components, adjacency, isolated vertices, trees, forests Directed graphs: indegree, outdegree, trees

Chromatic number and vertex colorings Eulerian cycles and Eulerian paths Hamiltonian paths Matchings Dominating Set Vertex Cover

Paths, Connected Components, etc.
A path is a sequence of vertices v1, v2, …, vn so that vi is adjacent to vi+1 for i=1,2,…,n-1. A simple path is one that does not have repeated vertices. A graph is connected if every pair of vertices in the graph is connected by some path. A connected component is a maximal subset of the vertices that is connected.

Cycles A cycle in a graph is a path that starts and ends at the same vertex. A simple cycle is a cycle that does not have any repeated vertices (other than the start and end vertex). A graph is acylic if it has no simple cycles.

Trees Two types: rooted and unrooted
Unrooted (simplest): acylic connected graph Rooted: take an unrooted tree, pick one node to be the root, and direct all edges away from the root. Voila!

Theorems about trees Let T be a connected acyclic graph (i.e., a tree) with n vertices (n>0). Then: T has at least one leaf (node with degree 0 or 1). T has n-1 edges. Every edge in T is a cut-edge. Every tree can be 2-colored.

Theorem: Every tree has at least one leaf (node of degree 1)
Theorem: For any tree T with at least one vertex, T has at least one leaf (node with degree 0 or 1). Proof: If n=1, then T is a single vertex which is a leaf. Else, n>1. Let P be a longest simple path in T, so P=v1,v2,…,vk. If vk has degree 1, we are done. Otherwise, vk has at least two neighbors, and so some neighbor w other than vk-1. If w is in P, then we have a simple cycle in T, contradicting that T is a tree. If w is not in P, then we can extend P and get a longer path, contradicting that P is a longest simple path in T. Hence, vk has degree 1, and we are done.

Theorem: Any tree with n>0 nodes has n-1 edges
Proof: by induction on n. Base case: n=1 (trivial) Inductive hypothesis: for some positive n, any tree on n nodes has exactly n-1 edges. Let T be a tree on n+1 nodes. We want to show T has exactly n edges.

Proof (cont’d) Let v be a node in T with degree 1.
Remove v from T. The result is a tree T’ with n nodes, and hence n-1 edges (by the inductive hypothesis) T’ contains one fewer edge and one fewer vertex (node) than T, and so T has n edges.

Theorem: every edge in a tree is a cut-edge
Proof (by contradiction). Suppose T is a tree, e=(v,w) is an edge in T that is not a cut-edge. Then G=T-{e} (but keeping v and w) is connected. Hence there is a simple path P from v to w in G. Since e is not in G, P does not include edge e. Therefore, we can form a simple cycle C by adding edge e to P. Since every edge in C is in T, this means that T is not acyclic, contradicting the assumption that T is a tree (connected acyclic graph).

Vertex Coloring A (proper) vertex coloring of a graph is a function c: V -> {1,2,…,k}, s.t. no two adjacent vertices are mapped to the same color. The chromatic number of a graph is the minimum number of colors needed to properly color the graph. How many colors does a tree need?

2-coloring a tree Theorem: every connected acyclic graph (i.e., tree) can be 2-colored. Proof: by induction on the number of vertices.

Proof that every tree can be 2-colored
Let G be a tree on n vertices. The base case is n=1. Clearly every tree on 1 vertex can be 2-colored. The Inductive Hypothesis is that for some positive integer n, any tree on n vertices can be 2-colored. Let G be a tree with n+1 vertices. We want to show that G can be 2-colored.

Proof (cont’d) Let v be a node in G that has degree 1, and let w be its unique neighbor in G. Consider the graph G’ formed by deleting v (and its incident edge but not w) from G. G’ is also acyclic (why?) and has n-1 vertices. Therefore, by the inductive hypothesis, G’ can be 2-colored. We extend the coloring from G’ to G, by letting c(v) be 1 if c(w)=2, and c(v)=2 if c(w)=1. Note that this coloring is proper for G. Hence G can be 2-colored.

Structural Induction This was a proof by structural induction.
Proofs by structural induction can be applied more generally!

A rooted tree in which every node has 0 or 2 children is called a “binary tree” Theorem: every binary tree with n nodes has (n-1)/2 internal nodes (defined to be nodes with more than 0 children). Proof: by strong induction on n. Base case: n=1. Such a tree has no internal nodes, so it is true.

Proof, cont’d. Strong Inductive hypothesis: for some n>0, and for all positive integers k up to n, all rooted binary trees with k nodes have (k-1)/2 internal nodes. Let T have n+1 nodes, and let the children of the root be A and B. (We know the root has two children, since if it had no children, T would have 1 node, contradicting our hypothesis.) We want to show Int(T) = n/2

We want to show Int(T) = n/2
TA, the subtree of T rooted at A, is a binary tree; let nA be the number of nodes in TA TB, the subtree of T rooted at B, is a binary tree; let nB be the number of nodes in TB Let Int(T) be the number of internal nodes of T, and Int(TA) and Int(TB) be similarly defined.

We want to show Int(T) = n/2
Then nA and nB are both at most n, and by the inductive hypothesis Int(TA) = (nA-1)/2 Int(TB ) = (nB-1)/2 Therefore Int(T) = (nA-1)/2 + (nB-1)/2 + 1

We want to show Int(T) = n/2
We have established that Int(T) = (nA-1)/2 + (nB-1)/2 + 1 Simplifying this, we get Int(T) = (nA-1 + nB )/2 = (nA + nB)/2 Note nT = nA + nB + 1 Therefore, Int(T) = (nT - 1)/2 Recall that nT = n+1. Therefore, Int(T) = n/2 Q.E.D.

Genome Assembly Given a DNA sequence, technology can allow you to get a collection of k-mers (substrings of length k) that come from analyses of the sequence. From these k-mers, your objective is to come up with the sequence.

Genome Assembly Let X be a very long DNA sequence
Consider all k-mers in X, with k big enough so that no k-mer appears two or more times Goal: reconstruct X from its set of k-mers

Genome Assembly, attempt #1
Approach 1: Make a node for each k-mer, and put a directed edge from v to w if the k-1 suffix of v is the k-1 prefix of w. Create the graph for the following string, using k=5 ACATAGGATTCAC

Genome Assembly, attempt #1
Approach 1: Make a node for each k-mer, and put a directed edge from v to w if the k-1 suffix of v is the k-1 prefix of w. Every such graph has a Hamiltonian Path, as long as no k-mer appears more than once!

Hamiltonian Path A Hamiltonian Path in a graph visits every node exactly once

Genome Assembly Attempt #1
Create the graph for the following string, using k=5 ACATAGGATTCAC Does the graph have a Hamiltonian Path? Is it unique? Can you reconstruct the sequence from the path?

Hamiltonian Path A Hamiltonian Path in a graph visits every node exactly once Determining if a graph has a Hamiltonian Path is NP-Complete So this approach to Genome Assembly is computationally intensive (infeasible)

Eulerian Cycles An Eulerian cycle is one that goes through every edge exactly once It is easy to see that if a graph has an Eulerian cycle, then every node has even degree. The converse is also true, but a bit harder to prove. For directed graphs, the cycle will need to follow the direction of the edges (also called “arcs”). In this case, a graph has an Eulerian cycle if and only if the indegree is equal to the outdegree for every node.

Eulerian Paths An Eulerian path is one that goes through every edge exactly once It is easy to see that if a graph has an Eulerian path, then all but 2 nodes have even degree. The converse is also true, but a bit harder to prove. For directed graphs, the cycle will need to follow the direction of the edges (also called “arcs”). In this case, a graph has an Eulerian path if and only if the indegree(v)=outdegree(v) for all but 2 nodes (x and y), where indegree(x)=outdegree(x)+1, and indegree(y)=outdegree(y)-1.

de Bruijn Graph Input: the set of k-mers for the DNA sequence
Output: the de Bruijn Graph Vertices: the (k-1)-mers Directed edges: from v->w if the (k-2)-suffix of v is the (k-2)-prefix of w, and the k-mer formed by starting with v and ending with w is one of the k-mers in the input

de Bruijn Graph If the k-mer set comes from a sequence and no k-mer appears more than once in the sequence, then the de Bruijn graph has an Eulerian path!

Using de Bruijn Graphs Given: set of k-mers from a DNA sequence
Algorithm: Construct the de Bruijn graph Find an Eulerian path in the graph The path defines a sequence with the same set of k-mers as the original

de Bruijn Graph Create the de Bruijn graph for the following string, using k=5 ACATAGGATTCAC Find the Eulerian path Is the Eulerian path unique? Reconstruct the sequence from this path