Longest Common Subsequence

Slides:



Advertisements
Similar presentations
Introduction to Algorithms 6.046J/18.401J/SMA5503
Advertisements

Lecture 8: Dynamic Programming Shang-Hua Teng. Longest Common Subsequence Biologists need to measure how similar strands of DNA are to determine how closely.
CPSC 335 Dynamic Programming Dr. Marina Gavrilova Computer Science University of Calgary Canada.
Overview What is Dynamic Programming? A Sequence of 4 Steps
Algorithms Dynamic programming Longest Common Subsequence.
COMP8620 Lecture 8 Dynamic Programming.
David Luebke 1 5/4/2015 CS 332: Algorithms Dynamic Programming Greedy Algorithms.
Comp 122, Fall 2004 Dynamic Programming. dynprog - 2 Lin / Devi Comp 122, Spring 2004 Longest Common Subsequence  Problem: Given 2 sequences, X =  x.
1 Longest Common Subsequence (LCS) Problem: Given sequences x[1..m] and y[1..n], find a longest common subsequence of both. Example: x=ABCBDAB and y=BDCABA,
UMass Lowell Computer Science Analysis of Algorithms Prof. Karen Daniels Fall, 2006 Lecture 1 (Part 3) Design Patterns for Optimization Problems.
UMass Lowell Computer Science Analysis of Algorithms Prof. Karen Daniels Spring, 2009 Design Patterns for Optimization Problems Dynamic Programming.
Dynamic Programming Code
UMass Lowell Computer Science Analysis of Algorithms Prof. Karen Daniels Fall, 2002 Lecture 1 (Part 3) Tuesday, 9/3/02 Design Patterns for Optimization.
Analysis of Algorithms CS 477/677
UMass Lowell Computer Science Analysis of Algorithms Prof. Karen Daniels Spring, 2008 Design Patterns for Optimization Problems Dynamic Programming.
November 7, 2005Copyright © by Erik D. Demaine and Charles E. Leiserson Dynamic programming Design technique, like divide-and-conquer. Example:
© 2004 Goodrich, Tamassia Dynamic Programming1. © 2004 Goodrich, Tamassia Dynamic Programming2 Matrix Chain-Products (not in book) Dynamic Programming.
Lecture 8: Dynamic Programming Shang-Hua Teng. First Example: n choose k Many combinatorial problems require the calculation of the binomial coefficient.
Analysis of Algorithms
1 Dynamic Programming Jose Rolim University of Geneva.
Lecture 7 Topics Dynamic Programming
1 Dynamic programming algorithms for all-pairs shortest path and longest common subsequences We will study a new technique—dynamic programming algorithms.
UMass Lowell Computer Science Analysis of Algorithms Prof. Karen Daniels Fall, 2005 Design Patterns for Optimization Problems Dynamic Programming.
Dynamic Programming (16.0/15) The 3-d Paradigm 1st = Divide and Conquer 2nd = Greedy Algorithm Dynamic Programming = metatechnique (not a particular algorithm)
Dynamic Programming UNC Chapel Hill Z. Guo.
ADA: 7. Dynamic Prog.1 Objective o introduce DP, its two hallmarks, and two major programming techniques o look at two examples: the fibonacci.
6/4/ ITCS 6114 Dynamic programming Longest Common Subsequence.
Dynamic Programming (Ch. 15) Not a specific algorithm, but a technique (like divide- and-conquer). Developed back in the day when “programming” meant “tabular.
Dynamic Programming David Kauchak cs302 Spring 2013.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 14.
Introduction to Algorithms Jiafen Liu Sept
CS 3343: Analysis of Algorithms Lecture 18: More Examples on Dynamic Programming.
Contest Algorithms January 2016 Introduce DP; Look at several examples; Compare DP to Greedy 11. Dynamic Programming (DP) 1Contest Algorithms: 11. DP.
Part 2 # 68 Longest Common Subsequence T.H. Cormen et al., Introduction to Algorithms, MIT press, 3/e, 2009, pp Example: X=abadcda, Y=acbacadb.
David Luebke 1 2/26/2016 CS 332: Algorithms Dynamic Programming.
CSC 213 Lecture 19: Dynamic Programming and LCS. Subsequences (§ ) A subsequence of a string x 0 x 1 x 2 …x n-1 is a string of the form x i 1 x.
9/27/10 A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne Adam Smith Algorithm Design and Analysis L ECTURE 16 Dynamic.
TU/e Algorithms (2IL15) – Lecture 4 1 DYNAMIC PROGRAMMING II
CS583 Lecture 12 Jana Kosecka Dynamic Programming Longest Common Subsequence Matrix Chain Multiplication Greedy Algorithms Many slides here are based on.
Dynamic Programming Typically applied to optimization problems
Algorithm Design and Analysis (ADA)
David Meredith Dynamic programming David Meredith
CS 3343: Analysis of Algorithms
Advanced Algorithms Analysis and Design
Least common subsequence:
Dynamic Programming CISC4080, Computer Algorithms CIS, Fordham Univ.
CS200: Algorithm Analysis
Dynamic Programming Several problems Principle of dynamic programming
Chapter 8 Dynamic Programming.
Dynamic Programming Comp 122, Fall 2004.
CSCE 411 Design and Analysis of Algorithms
Dynamic Programming.
CS Algorithms Dynamic programming 0-1 Knapsack problem 12/5/2018.
Dynamic Programming Dr. Yingwu Zhu Chapter 15.
CS 3343: Analysis of Algorithms
Merge Sort Dynamic Programming
CS6045: Advanced Algorithms
Longest Common Subsequence
Dynamic Programming Comp 122, Fall 2004.
Lecture 8. Paradigm #6 Dynamic Programming
Introduction to Algorithms: Dynamic Programming
Longest Common Subsequence
Dynamic Programming CISC4080, Computer Algorithms CIS, Fordham Univ.
Algorithms and Data Structures Lecture X
Longest common subsequence (LCS)
Analysis of Algorithms CS 477/677
Longest Common Subsequence
Dynamic Programming.
Longest Common Subsequence
Algorithm Course Dr. Aref Rashad
Presentation transcript:

Longest Common Subsequence Dynamic Programming Longest Common Subsequence

Common subsequence A subsequence of a string is the string with zero or more chars left out A common subsequence of two strings: A subsequence of both strings Ex: x = {A B C B D A B }, y = {B D C A B A} {B C} and {A A} are both common subsequences of x and y

Longest Common Subsequence Given two sequences x[1 . . m] and y[1 . . n], find a longest subsequence common to them both. “a” not “the” x: A B C D y: BCBA = LCS(x, y) functional notation, but not a function

Brute-force LCS algorithm Check every subsequence of x[1 . . m] to see if it is also a subsequence of y[1 . . n]. Analysis 2m subsequences of x (each bit-vector of length m determines a distinct subsequence of x). Hence, the runtime would be exponential ! Towards a better algorithm: a DP strategy Key: optimal substructure and overlapping sub-problems First we’ll find the length of LCS. Later we’ll modify the algorithm to find LCS itself.

Optimal substructure Notice that the LCS problem has optimal substructure: parts of the final solution are solutions of subproblems. If z = LCS(x, y), then any prefix of z is an LCS of a prefix of x and a prefix of y. Subproblems: “find LCS of pairs of prefixes of x and y” i m x z n y j

Recursive thinking m x n y Case 1: x[m]=y[n]. There is an optimal LCS that matches x[m] with y[n]. Case 2: x[m] y[n]. At most one of them is in LCS Case 2.1: x[m] not in LCS Case 2.2: y[n] not in LCS Find out LCS (x[1..m-1], y[1..n-1]) Find out LCS (x[1..m-1], y[1..n]) Find out LCS (x[1..m], y[1..n-1])

Recursive thinking Case 1: x[m]=y[n] Case 2: x[m]  y[n] LCS(x, y) = LCS(x[1..m-1], y[1..n-1]) || x[m] Case 2: x[m]  y[n] LCS(x, y) = LCS(x[1..m-1], y[1..n]) or LCS(x[1..m], y[1..n-1]), whichever is longer Reduce both sequences by 1 char concatenate Reduce either sequence by 1 char

Finding length of LCS m x n y Let c[i, j] be the length of LCS(x[1..i], y[1..j]) => c[m, n] is the length of LCS(x, y) If x[m] = y[n] c[m, n] = c[m-1, n-1] + 1 If x[m] != y[n] c[m, n] = max { c[m-1, n], c[m, n-1] }

Generalize: recursive formulation c[i, j] = c[i–1, j–1] + 1 if x[i] = y[j], max{c[i–1, j], c[i, j–1]} otherwise. ... 1 2 i m j n x: y:

Recursive algorithm for LCS LCS(x, y, i, j) if x[i] = y[ j] then c[i, j]  LCS(x, y, i–1, j–1) + 1 else c[i, j]  max{ LCS(x, y, i–1, j), LCS(x, y, i, j–1)} Worst-case: x[i] ¹ y[ j], in which case the algorithm evaluates two subproblems, each with only one parameter decremented.

Recursion tree m = 3, n = 4: same subproblem m+n 3,4 m+n 2,4 3,3 same subproblem , but we’re solving subproblems already solved! 1,4 2,3 2,3 3,2 1,3 2,2 1,3 2,2 Height = m + n  work potentially exponential.

DP Algorithm c[i–1, j–1] + 1 if x[i] = y[j], Key: find out the correct order to solve the sub-problems Total number of sub-problems: m * n c[i, j] = c[i–1, j–1] + 1 if x[i] = y[j], max{c[i–1, j], c[i, j–1]} otherwise. j n C(i, j) i m

DP Algorithm LCS-Length(X, Y) 1. m = length(X) // get the # of symbols in X 2. n = length(Y) // get the # of symbols in Y 3. for i = 1 to m c[i,0] = 0 // special case: Y[0] 4. for j = 1 to n c[0,j] = 0 // special case: X[0] 5. for i = 1 to m // for all X[i] 6. for j = 1 to n // for all Y[j] 7. if ( X[i] == Y[j]) 8. c[i,j] = c[i-1,j-1] + 1 9. else c[i,j] = max( c[i-1,j], c[i,j-1] ) 10. return c

LCS Example What is the LCS of X and Y? LCS(X, Y) = BCB X = A B C B We’ll see how LCS algorithm works on the following example: X = ABCB Y = BDCAB What is the LCS of X and Y? LCS(X, Y) = BCB X = A B C B Y = B D C A B

Computing the Length of the LCS

LCS Example (0) ABCB BDCAB X = ABCB; m = |X| = 4 j 0 1 2 3 4 5 i Y[j] B D C A B X[i] A 1 B 2 3 C 4 B X = ABCB; m = |X| = 4 Y = BDCAB; n = |Y| = 5 Allocate array c[5,6]

LCS Example (1) ABCB BDCAB for i = 1 to m c[i,0] = 0 j 0 1 2 3 4 5 i Y[j] B D C A B X[i] A 1 B 2 3 C 4 B for i = 1 to m c[i,0] = 0 for j = 1 to n c[0,j] = 0

LCS Example (2) ABCB BDCAB j 0 1 2 3 4 5 i Y[j] B D C A B X[i] A 1 B 2 A 1 B 2 3 C 4 B if ( Xi == Yj ) c[i,j] = c[i-1,j-1] + 1 else c[i,j] = max( c[i-1,j], c[i,j-1] )

LCS Example (3) ABCB BDCAB j 0 1 2 3 4 5 i Y[j] B D C A B X[i] A 1 B 2 A 1 B 2 3 C 4 B if ( Xi == Yj ) c[i,j] = c[i-1,j-1] + 1 else c[i,j] = max( c[i-1,j], c[i,j-1] )

LCS Example (4) ABCB BDCAB j 0 1 2 3 4 5 i Y[j] B D C A B X[i] A 1 1 B A 1 1 B 2 3 C 4 B if ( Xi == Yj ) c[i,j] = c[i-1,j-1] + 1 else c[i,j] = max( c[i-1,j], c[i,j-1] )

LCS Example (5) ABCB BDCAB j 0 1 2 3 4 5 i Y[j] B D C A B X[i] A 1 1 1 A 1 1 1 B 2 3 C 4 B if ( Xi == Yj ) c[i,j] = c[i-1,j-1] + 1 else c[i,j] = max( c[i-1,j], c[i,j-1] )

LCS Example (6) ABCB BDCAB j 0 1 2 3 4 5 i Y[j] B D C A B X[i] A 1 1 1 A 1 1 1 B 2 1 3 C 4 B if ( Xi == Yj ) c[i,j] = c[i-1,j-1] + 1 else c[i,j] = max( c[i-1,j], c[i,j-1] )

LCS Example (7) ABCB BDCAB j 0 1 2 3 4 5 i Y[j] B D C A B X[i] A 1 1 1 A 1 1 1 B 2 1 1 1 1 3 C 4 B if ( Xi == Yj ) c[i,j] = c[i-1,j-1] + 1 else c[i,j] = max( c[i-1,j], c[i,j-1] )

LCS Example (8) ABCB BDCAB j 0 1 2 3 4 5 i Y[j] B D C A B X[i] A 1 1 1 A 1 1 1 B 2 1 1 1 1 2 3 C 4 B if ( Xi == Yj ) c[i,j] = c[i-1,j-1] + 1 else c[i,j] = max( c[i-1,j], c[i,j-1] )

LCS Example (9) ABCB BDCAB j 0 1 2 3 4 5 i Y[j] B D C A B X[i] A 1 1 1 A 1 1 1 B 2 1 1 1 1 2 3 C 1 1 4 B if ( Xi == Yj ) c[i,j] = c[i-1,j-1] + 1 else c[i,j] = max( c[i-1,j], c[i,j-1] )

LCS Example (10) ABCB BDCAB j 0 1 2 3 4 5 i Y[j] B D C A B X[i] A 1 1 A 1 1 1 B 2 1 1 1 1 2 3 C 1 1 2 4 B if ( Xi == Yj ) c[i,j] = c[i-1,j-1] + 1 else c[i,j] = max( c[i-1,j], c[i,j-1] )

LCS Example (11) ABCB BDCAB j 0 1 2 3 4 5 i Y[j] B D C A B X[i] A 1 1 A 1 1 1 B 2 1 1 1 1 2 3 C 1 1 2 2 2 4 B if ( Xi == Yj ) c[i,j] = c[i-1,j-1] + 1 else c[i,j] = max( c[i-1,j], c[i,j-1] )

LCS Example (12) ABCB BDCAB j 0 1 2 3 4 5 i Y[j] B D C A B X[i] A 1 1 A 1 1 1 B 2 1 1 1 1 2 3 C 1 1 2 2 2 4 B 1 if ( Xi == Yj ) c[i,j] = c[i-1,j-1] + 1 else c[i,j] = max( c[i-1,j], c[i,j-1] )

LCS Example (13) ABCB BDCAB j 0 1 2 3 4 5 i Y[j] B D C A B X[i] A 1 1 A 1 1 1 B 2 1 1 1 1 2 3 C 1 1 2 2 2 4 B 1 1 2 2 if ( Xi == Yj ) c[i,j] = c[i-1,j-1] + 1 else c[i,j] = max( c[i-1,j], c[i,j-1] )

LCS Example (14) 3 ABCB BDCAB j 0 1 2 3 4 5 i Y[j] B D C A B X[i] A 1 A 1 1 1 B 2 1 1 1 1 2 3 C 1 1 2 2 2 3 4 B 1 1 2 2 if ( Xi == Yj ) c[i,j] = c[i-1,j-1] + 1 else c[i,j] = max( c[i-1,j], c[i,j-1] )

LCS Algorithm Running Time LCS algorithm calculates the values of each entry of the array c[m,n] So what is the running time? O(m*n) since each c[i,j] is calculated in constant time, and there are m*n elements in the array

How to find actual LCS For example, here The algorithm just found the length of LCS, but not LCS itself. How to find the actual LCS? For each c[i,j] we know how it was acquired: A match happens only when the first equation is taken So we can start from c[m,n] and go backwards, remember x[i] whenever c[i,j] = c[i-1, j-1]+1. 2 2 For example, here c[i,j] = c[i-1,j-1] +1 = 2+1=3 2 3

Finding LCS 3 Time for trace back: O(m+n). j 0 1 2 3 4 5 i Y[j] B D C X[i] A 1 1 1 B 2 1 1 1 1 2 3 C 1 1 2 2 2 3 4 B 1 1 2 2 Time for trace back: O(m+n).

Finding LCS (2) 3 LCS (reversed order): B C B B C B j 0 1 2 3 4 5 i Y[j] B D C A B X[i] A 1 1 1 B 2 1 1 1 1 2 3 C 1 1 2 2 2 3 4 B 1 1 2 2 LCS (reversed order): B C B B C B (this string turned out to be a palindrome) LCS (straight order):

Compute Length of an LCS B C A c table (represent b table) source: 91.503 textbook Cormen, et al.

Construct an LCS

LCS-Length(X, Y) // dynamic programming solution m = X.length() n = Y.length() for i = 1 to m do c[i,0] = 0 for j = 0 to n do c[0,j] = 0 O(nm) for i = 1 to m do // row for j = 1 to n do // cloumn if xi = =yi then c[i,j] = c[i-1,j-1] + 1 b[i,j] =“ ” else if c[i-1, j]  c[i,j-1] then c[i,j] = c[i-1,j] b[i,j] = “^” else c[i,j] = c[i,j-1] b[i,j] = “<”

First Optimal-LCS initializes row 0 and column 0

Next each c[i, j] is computed, row by row, starting at c[1,1]. If xi == yj then c[i, j] = c[i-1, j-1]+1 and b[i, j] =

If xi <> yj then c[i, j] = max(c[i-1, j], c[i, j-1]) and b[i, j] points to the larger value

if c[i-1, j] == c[i, j-1] then b[i,j] points up

To construct the LCS, start in the bottom right-hand corner and follow the arrows. A indicates a matching character.

LCS: B C B A

Constructing an LCS Print-LCS(b,X,i,j) if i = 0 or j = 0 then return if b[i,j] = “ ” then Print-LCS(b, X, i-1, j-1) print xi else if b[i,j] = “^” then Print-LCS(b, X, i-1, j) else Print-LCS(b, X, i, j-1)