Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dynamic Programming-- Longest Common Subsequence

Similar presentations


Presentation on theme: "Dynamic Programming-- Longest Common Subsequence"— Presentation transcript:

1 Dynamic Programming-- Longest Common Subsequence

2 Subsequences A subsequence of a character string x0x1x2…xn-1 is a string of the form xi1xi2…xik, where ij < ij+1. Not the same as substring! Gaps can exist in subsequences Example String: ABCDEFGHIJK Subsequence: ACEGIJK Subsequence: DFGHK Not subsequence: DAGH Dynamic Programming

3 The Longest Common Subsequence (LCS) Problem
Given two strings X and Y find a longest subsequence common to both X and Y Has applications to DNA similarity testing (alphabet is {A,C,G,T}) disease identification, parental tests Example: ABCDEFG and XZACKDFWGH have ACDFG as a longest common subsequence Dynamic Programming

4 A Poor Approach to the LCS Problem
A Brute-force solution: Enumerate all subsequences of X Test which ones are also subsequences of Y Pick the longest one. Analysis: If X is of length n, then it has 2n subsequences Hint: a char can be presence/absent (binary) exponential-time algorithm! Dynamic Programming

5 Observations Can be viewed as string editing
Transform one string to another by keeping/adding/deleting characters Can also be viewed as aligning two strings Any ideas? -- T G C A

6 Observation: Pairs of indices
1 2 1 2 3 3 4 4 5 5 6 7 6 8 7 i index: elements of X -- T G C A T -- A -- C elements of Y A T -- C -- T G A T C j index:

7 Observation: Pairs of indices
1 2 1 2 3 3 4 4 5 5 6 7 6 8 7 i index: elements of X -- T G C A T -- A -- C elements of Y A T -- C -- T G A T C j index: (0,0) (0,1) (1,2) (2,2) (3,3) (4,3) (5,4) (5,5) (6,6) (6,7) (7,8) Matches shown in green

8 Observation: Pairs of indices
1 2 1 2 3 3 4 4 5 5 6 7 6 8 7 i index: elements of X -- T G C A T -- A -- C elements of Y A T -- C -- T G A T C j index: (0,0) (0,1) (1,2) (2,2) (3,3) (4,3) (5,4) (5,5) (6,6) (6,7) (7,8) Matches shown in green Pairs of indices correspond to a path in a 2-D grid

9 Edit Graph for LCS Problem
j 1 2 3 4 5 6 7 8 i T 1 G 2 C 3 A 4 T 5 A 6 C 7 (0,0) (0,1) (1,2) (2,2) (3,3) (4,3) (5,4) (5,5) (6,6) (6,7) (7,8)

10 Observation: Pairs of indices
1 2 1 2 3 3 4 4 5 5 6 7 6 8 7 i index: elements of X -- T G C A T -- A -- C elements of Y A T -- C -- T G A T C j index: Skipping a letter in Y (increment index j) (0,0) to (0,1) (5,4) to (5,5) (6,6) to (6,7)

11 Edit Graph for LCS Problem
Going right Increment the column index But not the row index Ie, skipping a col char T G C A 1 2 3 4 5 6 7 i 8 j

12 Observation: Pairs of indices
1 2 1 2 3 3 4 4 5 5 6 7 6 8 7 i index: elements of X -- T G C A T -- A -- C elements of Y A T -- C -- T G A T C j index: Skipping a letter in X (increment index i) (1,2) to (2,2) (3,3) to (4,3)

13 Edit Graph for LCS Problem
1 2 3 4 5 6 7 i 8 j Going down Increment the row index But not the column index Ie, skipping a row char

14 Observation: Pairs of indices
1 2 1 2 3 3 4 4 5 5 6 7 6 8 7 i index: elements of X -- T G C A T -- A -- C elements of Y A T -- C -- T G A T C j index: Matching a letter in X and Y (increment indices i and j) (0,1) to (1,2) (2,2) to (3,3) (4,3) to (5,4)

15 Edit Graph for LCS Problem
1 2 3 4 5 6 7 i 8 j Going diagonal Increment the row index And the column index Ie, matching char

16 Edit Graph for LCS Problem
1 2 3 4 5 6 7 i 8 j Brown Path Skip A (column) Match T Skip G (row) Match C Skip A (row) ? 1 2 3 4 5 6 7 8 9 10

17 Edit Graph for LCS Problem
1 2 3 4 5 6 7 i 8 j Brown Path Skip A (column) Match T Skip G (row) Match C Skip A (row) Skip G (column) Match A Skip T (column) 1 2 3 4 5 6 7 8 9 10

18 Edit Graph for LCS Problem
j 1 2 3 4 5 6 7 8 i Always can skip a row char (down edge) Always can skip a col char (right edge) Additional diagonal edges (matching) T 1 G 2 C 3 A 4 T 5 A 6 C 7

19 Edit Graph for LCS Problem
j 1 2 3 4 5 6 7 8 Every path is a common subsequence. Every diagonal edge adds an extra element to common subsequence LCS Problem: Find a path with the maximum number of diagonal edges i T 1 G 2 C 3 A 4 T 5 A 6 C 7

20 Let Li,j = length of common subsequence up to X[i] and Y[j]
Computing LCS Let Li,j = length of common subsequence up to X[i] and Y[j] i-1,j i-1,j -1 1 i,j -1 i,j Li-1,j skip char at Xi Li,j = MAX Li,j skip char at Yj Li-1,j , if Xi = Yj , lengthen the match

21 LCS length L(Xi,Yj) is computed by:
Computing LCS LCS length L(Xi,Yj) is computed by: Li, j = max Li-1, j skip Xi Li, j skip Yj Li-1, j if Xi = Yj

22 LCS length L(Xi,Yj) is computed by:
Example for a mismatch LCS length L(Xi,Yj) is computed by: Li, j = max Li-1, j skip Xi Li, j skip Yj Li-1, j if Xi = Yj A 1 C ?

23 LCS length L(Xi,Yj) is computed by:
Example for a mismatch LCS length L(Xi,Yj) is computed by: Li, j = max Li-1, j skip Xi Li, j skip Yj Li-1, j if Xi = Yj A 1 C

24 LCS length L(Xi,Yj) is computed by:
Example for a match LCS length L(Xi,Yj) is computed by: Li, j = max Li-1, j skip Xi Li, j skip Yj Li-1, j if Xi = Yj A 1 ?

25 LCS length L(Xi,Yj) is computed by:
Example for a match LCS length L(Xi,Yj) is computed by: Li, j = max Li-1, j skip Xi Li, j skip Yj Li-1, j if Xi = Yj A 1

26 Dynamic Programming Example
T C X Y 1 2 3 4 5 6 7 For simplicity: Table index starts from zero String index starts from 1 Table index 1 corresponds to string index 1 as shown Initialize row 0 and col 0 to be zeros in the table A T G 1 2 3 4 5 6 7

27 Dynamic Programming Example
T C X Y 1 2 3 4 5 6 7 A T G 1 1 2 Li, j = max Li-1, j skip Xi Li, j skip Yj Li-1, j if Xi = Yj 3 4 5 6 7

28 Dynamic Programming: Backtracing
Arrows show where the score originated from. if from the left , skip Xi if from the top, skip Yj if Xi = Yj

29 Dynamic Programming Example
T C X Y 1 2 3 4 5 6 7 A T G 1 1 1 2 Li, j = max Li-1, j skip Xi Li, j skip Yj Li-1, j if Xi = Yj 3 4 5 6 7

30 Dynamic Programming Example
T C X Y 1 2 3 4 5 6 7 A T G 1 1 1 1 2 Li, j = max Li-1, j skip Xi Li, j skip Yj Li-1, j if Xi = Yj 3 4 5 6 7

31 Dynamic Programming Example
T C X Y 1 2 3 4 5 6 7 A T G 1 1 1 1 1 2 Li, j = max Li-1, j skip Xi Li, j skip Yj Li-1, j if Xi = Yj 3 4 5 6 7

32 Dynamic Programming Example
T C X Y 1 2 3 4 5 6 7 A T G 1 1 1 1 1 1 2 Li, j = max Li-1, j skip Xi Li, j skip Yj Li-1, j if Xi = Yj 3 4 5 6 7

33 Dynamic Programming Example
T C X Y 1 2 3 4 5 6 7 A T G 1 1 1 1 1 1 1 2 Li, j = max Li-1, j skip Xi Li, j skip Yj Li-1, j if Xi = Yj 3 4 5 6 7

34 Dynamic Programming Example
T C X Y 1 2 3 4 5 6 7 A T G 1 1 1 1 1 1 1 1 2 Li, j = max Li-1, j skip Xi Li, j skip Yj Li-1, j if Xi = Yj 3 4 5 6 7

35 Dynamic Programming Example
T C X Y 1 2 3 4 5 6 7 1 A T G 1 2 Li, j = max Li-1, j skip Xi Li, j skip Yj Li-1, j if Xi = Yj 3 4 5 6 7

36 Dynamic Programming Example
T C X 1 2 3 4 5 6 7 Y A T G 1 1 1 1 1 1 1 1 Li, j = max Li-1, j skip Xi Li, j skip Yj Li-1, j if Xi = Yj 2 1 2 2 2 2 2 2 3 1 2 4 1 2 5 1 2 6 1 2 1 7 2

37 Dynamic Programming Example
T C X 1 2 3 4 5 6 7 A T G Y Continuing with the dynamic programming algorithm gives this result. 1 1 1 1 1 1 1 1 2 1 2 2 2 2 2 2 3 1 2 2 3 3 3 3 4 1 2 2 3 4 4 4 5 1 2 2 3 4 4 4 6 1 2 2 3 4 5 5 1 7 2 2 3 4 5 5

38 LCS Algorithm { LCS(X,Y) for i  1 to n // zeros in column 0 Li,0  0
for j  1 to m // zeros in row 0 L0,j  0 for i  1 to n // assume string index starts from 1 for j  1 to m Li-1,j Li,j  max Li,j-1 Li-1,j-1 + 1, if Xi = Yj return (L) {

39 Length but not sequence yet
1 2 3 4 5 6 7 G A T C X Y LCS(X,Y) creates the alignment grid Now we need a way to read the best alignment of X and Y Follow the arrows backwards from sink

40 LCS: Backtracing without “arrows”
L i-1, j-1 L i-1, j L i, j-1 L i, j 5 6 D Pair of chars are not the same and L i-1, j >= L i, j-1 From which direction did we get L i, j? Dynamic Programming

41 LCS: Backtracing without “arrows”
getLCS(X,Y,L) i = m // assume string index from 1 to m (or n) j = n while (Lij > 0) if (Xi = Yj) // from diagonal S.insertFront(Xi ), decrement i, decrement j else if (Li-1,j >= Li,j-1) // cell above is larger decrement i // from above else decrement j // from left return S

42 Note on book’s implementation
G A T C X Y 1 2 3 4 5 6 7 For simplicity (slides): Table index starts from zero String index starts from 1 Table index 1 corresponds to string index 1 as shown Book: String index starts from zero Table index 1 corresponds to string index 0 A T G 1 2 3 4 5 6 7

43 Time Complexity Length of two strings: n and m ?

44 Time Complexity Length of two strings: n and m
O(nm) to build the table O(m + n) to backtrace O(nm)

45 Dynamic Programming Well, sounds like what technique we learned?
Algorithm for LCS is an example Key observations: Problem can be decomposed into smaller subproblems (“decomposition”) Solution for a problem can be composed from solutions for subproblems (“composition”) Smallest problems with known solutions (“base case”) Well, sounds like what technique we learned? Dynamic Programming

46 Dynamic Programming Well, sounds like recursion will do as well
Algorithm for LCS is an example Key observations: Problem can be decompose into smaller subproblems (“decomposition”) Solution for a problem can be composed from solutions for subproblems (“composition”) Smallest problems with known solutions (“base case”) Well, sounds like recursion will do as well Yes, but the subproblems are repeated Dynamic Programing remembers the solutions for subproblems Ie, reuse the solutions -> saving time Dynamic Programming

47 Reusing solutions I needs E, F, H F needs B, C, E H needs D, E, G
A B C D E F G H I I needs E, F, H F needs B, C, E H needs D, E, G Recursion E is a subproblem of I, F, and H E would be calculated 3 times Dynamic programming E would be calculated only once!! Dynamic Programming

48 The General Dynamic Programming Technique
Simple subproblems (decomposition): can be defined in terms of a few variables Subproblem optimality (composition): the global optimum value can be defined in terms of optimal subproblems Subproblem overlap (repeated) Construct solutions bottom-up (starting with “base cases”) Remembering solutions for subproblems Dynamic Programming

49 Skipping the rest Dynamic Programming

50 Another example of Dynamic Programming

51 Matrix Chain-Products
Dynamic Programming can be used. Review: Matrix Multiplication. C = A*B A is d × e and B is e × f O(def ) time A C B d f e i j i,j Dynamic Programming

52 Matrix Chain-Products
Compute A=A0*A1*…*An-1 Ai is di × di+1 Problem: How to parenthesize? Example B is 3 × 100 C is 100 × 5 D is 5 × 5 (B*C)*D takes = 1575 ops B*(C*D) takes = 4000 ops Dynamic Programming

53 A Brute-force Approach
Algorithm Try all possible ways to parenthesize A=A0*A1*…*An-1 Calculate number of ops for each one Pick the one that is best Running time: The number of paranethesizations is equal to the number of binary trees with n nodes This is exponential! It is called the Catalan number, and it is almost 4n. This is a terrible algorithm! Dynamic Programming

54 A Greedy Approach Idea #1: repeatedly select the product that uses the most operations. Counter-example: A is 10 × 5 B is 5 × 10 C is 10 × 5 D is 5 × 10 Greedy idea #1 gives (A*B)*(C*D), which takes = 2000 ops A*((B*C)*D) takes = 1000 ops Dynamic Programming

55 Another Greedy Approach
Idea #2: repeatedly select the product that uses the fewest operations. Counter-example: A is 101 × 11 B is 11 × 9 C is 9 × 100 D is 100 × 99 Greedy idea #2 gives A*((B*C)*D)), which takes = ops (A*B)*(C*D) takes = ops The greedy approach is not giving us the optimal value. Dynamic Programming

56 A “Recursive” Approach
Define subproblems: Find the best parenthesization of Ai*Ai+1*…*Aj. Let Ni,j denote the number of operations done by this subproblem. The optimal solution for the whole problem is N0,n-1. Dynamic Programming

57 A “Recursive” Approach
Subproblem optimality: The optimal solution can be defined in terms of optimal subproblems: There has to be a final multiplication (root of the expression tree) for the optimal solution. final multiply is at index i: (A0*…*Ai)*(Ai+1*…*An-1). Dynamic Programming

58 A “Recursive” Approach
Subproblem optimality: The optimal solution can be defined in terms of optimal subproblems: There has to be a final multiplication (root of the expression tree) for the optimal solution. final multiply is at index i: (A0*…*Ai)*(Ai+1*…*An-1). Then the optimal solution N0,n-1 is the sum of two optimal subproblems, N0,i and Ni+1,n-1 plus the time for the last multiply. Dynamic Programming

59 A Characterizing Equation
The global optimal has to be defined in terms of optimal subproblems Consider all possible places (k) for that final multiply: Recall that Ai is a di × di+1 dimensional matrix. a characterizing equation for Ni,j is: Note that subproblems are not independent--the subproblems overlap: Dynamic Programming! Operations in the left product Operations in the right product Operations in multiplying the 2 products Dynamic Programming

60 Example Matrices A is 10 × 5 B is 5 × 10 C is 10 × 5 D is 5 × 10
Dimensions: d0=10 d1=5 d2=10 d3=5 d4=10 N 1 2 3 Dynamic Programming

61 AB Matrices A is 10 × 5 B is 5 × 10 C is 10 × 5 D is 5 × 10
Dimensions: d0=10 d1=5 d2=10 d3=5 d4=10 N 1 2 3 500 Dynamic Programming

62 BC Matrices A is 10 × 5 B is 5 × 10 C is 10 × 5 D is 5 × 10
Dimensions: d0=10 d1=5 d2=10 d3=5 d4=10 N 1 2 3 500 250 Dynamic Programming

63 CD Matrices A is 10 × 5 B is 5 × 10 C is 10 × 5 D is 5 × 10
Dimensions: d0=10 d1=5 d2=10 d3=5 d4=10 N 1 2 3 500 250 Dynamic Programming

64 ABC: A(BC) vs (AB)C Matrices A is 10 × 5 B is 5 × 10 C is 10 × 5
D is 5 × 10 N[0,2] A(BC) N[0,0] + N[1,2] + 10*5*5 = 500 N 1 2 3 500 250 Dynamic Programming

65 ABC: A(BC) vs (AB)C Matrices A is 10 × 5 B is 5 × 10 C is 10 × 5
D is 5 × 10 N[0,2] A(BC) N[0,0] + N[1,2] + 10*5*5 = 500 N[0,2] (AB)C N[0,1] + N[2,2] + 10*10*5 = 1000 N 1 2 3 500 (k=0) 250 Dynamic Programming

66 BCD: B(CD) vs (BC)D Matrices A is 10 × 5 B is 5 × 10 C is 10 × 5
D is 5 × 10 N[1,3] B(CD) N[1,3] (BC)D N 1 2 3 500 (k=0) 250 ? (k=?) Dynamic Programming

67 BCD: B(CD) vs (BC)D Matrices A is 10 × 5 B is 5 × 10 C is 10 × 5
D is 5 × 10 N[1,3] B(CD) N[1,1] + N[2,3] + 5*10*10 = 1000 N[1,3] (BC)D N[1,2]+ N[3,3] + 5*5*10 = 500 N 1 2 3 500 (k=0) 250 (k=2) Dynamic Programming

68 ABCD: A(BCD) vs (AB)(CD) vs (ABC)D
Matrices A is 10 × 5 B is 5 × 10 C is 10 × 5 D is 5 × 10 N[0,3] A(BCD) N[0,0] + N[1,3] + 10*5*10 = 1000 N[0,3] (AB)(CD) N[0,1] + N[2,3] + 10*10*10 =2000 N[0,3] (ABC)D N[0,2]+ N[3,3] + 10*5*10 =1000 N 1 2 3 500 (k=0) 1000 250 (k=2) Dynamic Programming

69 ABCD Matrices A is 10 × 5 B is 5 × 10 C is 10 × 5 D is 5 × 10
N[0,3] --- ABCD k=0 A(BCD) N[1,3] BCD k=2 (BC)D A((BC)D) N 1 2 3 500 (k=0) 1000 250 (k=2) Dynamic Programming

70 Time Complexity answer N
Filling in each entry in the N table takes O(n) time. O(n2) entries Building the table: O(n3) O(n) for finding the parenthesis locations Total: O(n3) answer N 1 2 n-1 j i Dynamic Programming

71 Keeping track of indices
Initialize diagonal i,j or i,i j=0 j=1 j=2 j=3 i=0 0,0 i=1 1,1 i=2 2,2 i=3 3,3 Dynamic Programming

72 Keeping track of indices
b = # of products in subchain 1 2 matrices => 1 product (or length of subchain – 1) i = 0 to 2 in this example Generally 0 to n-b-1 [4-1-1] n-b combinations -1 for zero-based index j = 1 to 3 in this example Generally, j = i+b j=0 j=1 j=2 j=3 i=0 0,1 i=1 1,2 i=2 2,3 i=3 ABCD: AB, BC, CD Dynamic Programming

73 Keeping track of indices
b = # of products in subchain 2 3 matrices => 2 products (or length of subchain – 1) i = 0 to 1 in this example 0 to n-b-1 [4-2-1] j = 2 to 3 in this example j = i+b j=0 j=1 j=2 j=3 i=0 0,2 i=1 1,3 i=2 i=3 ABCD: ABC, BCD Dynamic Programming

74 Keeping track of indices
b = # of products in subchain 3 4 matrices => 3 products (or length of subchain – 1) i = 0 to 0 in this example 0 to n-b-1 [4-3-1] j = 3 to 3 in this example j = i+b j=0 j=1 j=2 j=3 i=0 0,3 i=1 i=2 i=3 ABCD: ABCD Dynamic Programming

75 A Dynamic Programming Algorithm
Algorithm matrixChain(S): Input: sequence S of n matrices to be multiplied Output: number of operations in an optimal paranethization of S for i  0 to n-1 do Ni,i  // diagonal entries for b  1 to n-1 do // number of products in subchain for i  0 to n-b-1 do // start of subchain (first matrix) j  i+b // end of subchain (last matrix) Ni,j  +infinity // initial value for min for k  i to j-1 do Ni,j  min{Ni,j , Ni,k +Nk+1,j +di dk+1 dj+1} Dynamic Programming

76 The General Dynamic Programming Technique
Applies to a problem that at first seems to require a lot of time (possibly exponential), provided we have: Simple subproblems (decomposition): the subproblems can be defined in terms of a few variables, such as j, k, l, m, and so on. Subproblem optimality (composition): the global optimum value can be defined in terms of optimal subproblems Subproblem overlap: the subproblems are not independent, but instead they overlap (hence, should be constructed bottom-up (base case)). Dynamic Programming


Download ppt "Dynamic Programming-- Longest Common Subsequence"

Similar presentations


Ads by Google