Lecture 7 Topics Dynamic Programming Reference: Introduction to Algorithm by Cormen Chapter 15: Dynamic Programming Data Structure and Algorithm
Longest Common Subsequence (LCS) Biologists need to measure similarity between DNA and thus determine how closely related an organism is to another. They do this by considering DNA as strings of letters A,C,G,T and then comparing similarities in the strings. Formally, they look at common subsequences in the strings. Example X = ABCBDAB, Y=BDCABA Subsequences may be: ABA, BCA, BCB, BBA BCBA BDAB etc. But the Longest Common Subsequences (LCS) are BCBA and BDAB. How to find LCS efficiently? Data Structure and Algorithm
Longest Common Subsequence ( Brute Force Approach ) if |X| = m, |Y| = n, then there are 2m subsequences of X; we must compare each with Y (n comparisons) So the running time of the brute-force algorithm is O(n 2m) Making it impractical for long sequences. LCS problem has optimal substructure: solutions of subproblems are parts of the final solution. Subproblems: “find LCS of pairs of prefixes of X and Y” Data Structure and Algorithm
LCS: Optimal Substructure Data Structure and Algorithm
LCS: Setup for Dynamic Programming First we’ll find the length of LCS, along the way we will leave “clues” on finding subsequence. Define Xi, Yj to be the prefixes of X and Y of length i and j respectively. Define c[i,j] to be the length of LCS of Xi and Yj Then the length of LCS of X and Y will be c[m,n] Data Structure and Algorithm
LCS recurrence Notice the issues…This recurrence is exponential. We don’t know max ahead of time. The subproblems overlap, to find LCS we need to find LCS of c[i, j-1] and of c[i-1, j] Data Structure and Algorithm
LCS Algorithm First we’ll find the length of LCS. Later we’ll modify the algorithm to find LCS itself. Recall we want to let Xi, Yj to be the prefixes of X and Y of length i and j respectively And that Define c[i,j] to be the length of LCS of Xi and Yj Then the length of LCS of X and Y will be c[m,n] Data Structure and Algorithm
LCS Recursive Solution We start with i = j = 0 (empty substrings of x and y) Since X0 and Y0 are empty strings, their LCS is always empty (i.e. c[0,0] = 0) LCS of empty string and any other string is empty, so for every i and j: c[0, j] = c[i,0] = 0 Data Structure and Algorithm
LCS Recursive Solution When we calculate c[i,j], we consider two cases: First case: x[i]=y[j]: one more symbol in strings X and Y matches, so the length of LCS Xi and Yj equals to the length of LCS of smaller strings Xi-1 and Yi-1 , plus 1 Data Structure and Algorithm
LCS Recursive Solution Second case: x[i] != y[j] As symbols don’t match, our solution is not improved, and the length of LCS(Xi , Yj) is the same as before (i.e. maximum of LCS(Xi, Yj-1) and LCS(Xi-1,Yj) Data Structure and Algorithm
LCS Example We’ll see how LCS algorithm works on the following X = ABCB Y = BDCAB What is the Longest Common Subsequence of X and Y? LCS(X, Y) = BCB X = A B C B Y = B D C A B Data Structure and Algorithm
LCS Example (0) X = ABCB; m = |X| = 4 Y = BDCAB; n = |Y| = 5 j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 B 2 3 C 4 B X = ABCB; m = |X| = 4 Y = BDCAB; n = |Y| = 5 Allocate array c[6,5] Data Structure and Algorithm
LCS Example (1) for i = 1 to m c[i,0] = 0 j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 B 2 3 C 4 B for i = 1 to m c[i,0] = 0 Data Structure and Algorithm
LCS Example (2) for j = 0 to n c[0,j] = 0 j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 B 2 3 C 4 B for j = 0 to n c[0,j] = 0 Data Structure and Algorithm
LCS Example (3) j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 B 2 3 C 4 B A 1 B 2 3 C 4 B case i=1 and j=1 A != B but, c[0,1]>=c[1,0] so c[1,1] = c[0,1], and b[1,1] = Data Structure and Algorithm
LCS Example (4) j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 B 2 3 C 4 B A 1 B 2 3 C 4 B case i=1 and j=2 A != D but, c[0,2]>=c[1,1] so c[1,2] = c[0,2], and b[1,2] = Data Structure and Algorithm
LCS Example (5) j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 B 2 3 C 4 B A 1 B 2 3 C 4 B case i=1 and j=3 A != C but, c[0,3]>=c[1,2] so c[1,3] = c[0,3], and b[1,3] = Data Structure and Algorithm
LCS Example (6) j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 1 B 2 3 C 4 B A 1 1 B 2 3 C 4 B case i=1 and j=4 A = A so c[1,4] = c[0,3]+1, and b[1,4] = Data Structure and Algorithm
LCS Example (7) j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 1 1 B 2 3 C 4 B A 1 1 1 B 2 3 C 4 B case i=1 and j=5 A != B this time c[0,5]<c[1,4] so c[1,5] = c[1, 4], and b[1,5] = Data Structure and Algorithm
LCS Example (8) j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 1 1 B 2 1 3 C 4 B A 1 1 1 B 2 1 3 C 4 B case i=2 and j=1 B = B so c[2, 1] = c[1, 0]+1, and b[2, 1] = Data Structure and Algorithm
LCS Example (9) j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 1 1 B 2 1 1 3 C 4 A 1 1 1 B 2 1 1 3 C 4 B case i=2 and j=2 B != D and c[1, 2] < c[2, 1] so c[2, 2] = c[2, 1] and b[2, 2] = Data Structure and Algorithm
LCS Example (10) j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 1 1 B 2 1 1 1 3 C A 1 1 1 B 2 1 1 1 3 C 4 B case i=2 and j=3 B != D and c[1, 3] < c[2, 2] so c[2, 3] = c[2, 2] and b[2, 3] = Data Structure and Algorithm
LCS Example (11) j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 1 1 B 2 1 1 1 1 3 A 1 1 1 B 2 1 1 1 1 3 C 4 B case i=2 and j=4 B != A and c[1, 4] = c[2, 3] so c[2, 4] = c[1, 4] and b[2, 2] = Data Structure and Algorithm
LCS Example (12) j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 1 1 B 2 1 1 1 1 2 A 1 1 1 B 2 1 1 1 1 2 3 C 4 B case i=2 and j=5 B = B so c[2, 5] = c[1, 4]+1 and b[2, 5] = Data Structure and Algorithm
LCS Example (13) j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 1 1 B 2 1 1 1 1 2 A 1 1 1 B 2 1 1 1 1 2 3 C 1 4 B case i=3 and j=1 C != B and c[2, 1] > c[3,0] so c[3, 1] = c[2, 1] and b[3, 1] = Data Structure and Algorithm
LCS Example (14) j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 1 1 B 2 1 1 1 1 2 A 1 1 1 B 2 1 1 1 1 2 3 C 1 1 4 B case i=3 and j= 2 C != D and c[2, 2] = c[3, 1] so c[3, 2] = c[2, 2] and b[3, 2] = Data Structure and Algorithm
LCS Example (15) j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 1 1 B 2 1 1 1 1 2 A 1 1 1 B 2 1 1 1 1 2 3 C 1 1 2 4 B case i=3 and j= 3 C = C so c[3, 3] = c[2, 2]+1 and b[3, 3] = Data Structure and Algorithm
LCS Example (16) j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 1 1 B 2 1 1 1 1 2 A 1 1 1 B 2 1 1 1 1 2 3 C 1 1 2 2 4 B case i=3 and j= 4 C != A c[2, 4] < c[3, 3] so c[3, 4] = c[3, 3] and b[3, 3] = Data Structure and Algorithm
LCS Example (17) j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 1 1 B 2 1 1 1 1 2 A 1 1 1 B 2 1 1 1 1 2 3 C 1 1 2 2 2 4 B case i=3 and j= 5 C != B c[2, 5] = c[3, 4] so c[3, 5] = c[2, 5] and b[3, 5] = Data Structure and Algorithm
LCS Example (18) j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 1 1 B 2 1 1 1 1 2 A 1 1 1 B 2 1 1 1 1 2 3 C 1 1 2 2 2 4 B 1 case i=4 and j=1 B = B so c[4, 1] = c[3, 0]+1 and b[4, 1] = Data Structure and Algorithm
LCS Example (19) j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 1 1 B 2 1 1 1 1 2 A 1 1 1 B 2 1 1 1 1 2 3 C 1 1 2 2 2 4 B 1 1 case i=4 and j=2 B != D c[3, 2] = c[4, 1] so c[4, 2] = c[3, 2] and b[4, 2] = Data Structure and Algorithm
LCS Example (20) j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 1 1 B 2 1 1 1 1 2 A 1 1 1 B 2 1 1 1 1 2 3 C 1 1 2 2 2 4 B 1 1 2 case i=4 and j= 3 B != C c[3, 3] > c[4, 2] so c[4, 3] = c[3, 3] and b[4, 3] = Data Structure and Algorithm
LCS Example (21) j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 1 1 B 2 1 1 1 1 2 A 1 1 1 B 2 1 1 1 1 2 3 C 1 1 2 2 2 4 B 1 1 2 2 case i=4 and j=4 B != A c[3, 4] = c[4, 3] so c[4, 4] = c[3, 4] and b[3, 5] = Data Structure and Algorithm
LCS Example (22) 3 j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 1 1 B 2 1 1 1 1 A 1 1 1 B 2 1 1 1 1 2 3 C 1 1 2 2 2 4 B 3 1 1 2 2 case i=4 and j=5 B= B so c[4, 5] = c[3, 4]+1 and b[4, 5] = Data Structure and Algorithm
LCS Algorithm LCS-Length(X, Y) m = length(X), n = length(Y) for i = 1 to m do c[i, 0] = 0 for j = 0 to n do c[0, j] = 0 for j = 1 to n do if ( xi = = yj ) then c[i, j] = c[i - 1, j - 1] + 1 else if c[i - 1, j]>=c[i, j - 1] then c[i, j] = c[i - 1, j] else c[i, j] = c[i, j - 1] return c and b Data Structure and Algorithm
LCS Algorithm Running Time LCS algorithm calculates the values of each entry of the array c[m,n] So the running time is clearly O(mn) as each entry is done in 3 steps. Now how to get at the solution? We use the arrows we created to guide us. We simply follow arrows back to base case 0 Data Structure and Algorithm
3 Finding LCS j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 1 1 B 2 1 1 1 1 2 3 A 1 1 1 B 2 1 1 1 1 2 3 C 1 1 2 2 2 3 4 B 1 1 2 2 Data Structure and Algorithm
3 Finding LCS (2) LCS: B C B j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 1 1 B A 1 1 1 B 2 1 1 1 1 2 3 C 1 1 2 2 2 3 4 B 1 1 2 2 LCS: B C B Data Structure and Algorithm
Finding LCS (3) Print_LCS (X, i, j) if i = 0 or j = 0 then return if b[i, j] = “ “ then Print_LCS (X, i-1, j-1) Print X[i] elseif b[i, j] = “ “ then Print_LCS (X, i-1, j) else Print_LCS (X, i, j-1) Cost: O(m+n) Data Structure and Algorithm
Element of Dynamic Programming Optimal Substructure Overlapping Subproblems Data Structure and Algorithm
Optimal Substructure A problem exhibits optimal substructure if an optimal solution contains optimal solutions to its subproblems. Build an optimal solution from optimal solutions to subproblems solutions of subproblems are parts of the final solution. Example :Longest Common Subsequence - An LCS contains within it optimal solutions to the prefixes of the two input sequences. Common with Greedy Solution Data Structure and Algorithm
Overlapping Subproblems Divide-and-Conquer is suitable when generating brand-new problems at each step of the recursion. Dynamic-programming algorithms take advantage of overlapping subproblems by solving each subproblem once and then storing the solution in a table where it can be looked up when needed, using constant time per lookup Data Structure and Algorithm
Dynamic VS Greedy Dynamic programming uses optimal substructure in a bottom-up fashion First find optimal solutions to subproblems and, having solved the subproblems, we find an optimal solution to the problem Greedy algorithms use optimal substructure in a top-down fashion First make a choice – the choice that looks best at the time – and then solving the resulting subproblem Data Structure and Algorithm