Presentation is loading. Please wait.

Presentation is loading. Please wait.

Longest common subsequence (LCS)

Similar presentations


Presentation on theme: "Longest common subsequence (LCS)"— Presentation transcript:

1 Longest common subsequence (LCS)

2 The longest common subsequence (LCS) problem
A string : A = b a c a d A subsequence of A: deleting 0 or more symbols from A (not necessarily consecutive). e.g. ad, ac, bac, acad, bacad, bcd. Common subsequences of A = b a c a d and B = a c c b a d c b : ad, ac, bac, acad. The longest common subsequence (LCS) of A and B: a c a d. Subsequence need not be consecutive, but must be in order.

3 Longest common subsequence
INPUT: two strings OUTPUT: longest common subsequence ACTGAACTCTGTGCACT TGACTCAGCACAAAAAC

4 Longest common subsequence
INPUT: two strings OUTPUT: longest common subsequence ACTGAACTCTGTGCACT TGACTCAGCACAAAAAC

5 Longest common subsequence
INPUT: two strings OUTPUT: longest common subsequence ACTGAACTCTGTGCACT TGACTCAGCACAAAAAC

6 Longest common subsequence
INPUT: two strings OUTPUT: longest common subsequence ACTGAACTCTGTGCACT TGACTCAGCACAAAAAC

7 Longest common subsequence
INPUT: two strings OUTPUT: longest common subsequence ACTGAACTCTGTGCACT TGACTCAGCACAAAAAC

8 Naïve /Brute Force Algorithm
For every subsequence of X, check whether it’s a subsequence of Y . X has 2m subsequences. Each subsequence takes Θ(n) time to check: scan Y for first letter, for second, and so on. Time: Θ(n2m).

9 Optimal Substructure Theorem
Let Z = z1, , zk be any LCS of X x1, . . , xk and Y y1, . . , yk . 1. If xm = yn, then zk = xm = yn and Zk-1 is an LCS of Xm-1 and Yn-1. 2. If xm  yn, then either zk  xm and Z is an LCS of Xm-1 and Y . or zk  yn and Z is an LCS of X and Yn-1.

10 Optimal Substructure Proof: (case 1: xm = yn)
Theorem Let Z = z1, , zk be any LCS of X and Y . 1. If xm = yn, then zk = xm = yn and Zk-1 is an LCS of Xm-1 and Yn-1. Proof: (case 1: xm = yn) Any sequence Z’ that does not end in xm = yn can be made longer by adding xm = yn to the end. Therefore, longest common subsequence (LCS) Z must end in xm = yn. Zk-1 is a common subsequence of Xm-1 and Yn-1, and there is no longer CS of Xm-1 and Yn-1, or Z would not be an LCS.

11 Optimal Substructure Proof: (case 2: xm  yn, and zk  xm)
Theorem Let Z = z1, , zk be any LCS of X and Y . 2. If xm  yn, then either zk  xm and Z is an LCS of Xm-1 and Y . Proof: (case 2: xm  yn, and zk  xm) Since Z does not end in xm, Z is a common subsequence of Xm-1 and Y, and there is no longer CS of yn-1 and x, or Z would not be an LCS.

12 Optimal Substructure Proof: (case 2: xm  yn, and zk  yn)
Theorem Let Z = z1, , zk be any LCS of X and Y . 2. If xm  yn, then zk  yn and Z is an LCS of X and Yn-1. Proof: (case 2: xm  yn, and zk  yn) Since Z does not end in yn, Z is a common subsequence of yn-1 and x, and there is no longer CS of Xm-1 and Y, or Z would not be an LCS.

13 Recursive Solution Sequences x1,…,xn, and y1,…,ym
LCS(i,j) = length of a longest common subsequence of x1,…,xi and y1,…,yj if xi = yj then LCS(i,j) =

14 Recursive Solution Sequences x1,…,xn, and y1,…,ym
LCS(i,j) = length of a longest common subsequence of x1,…,xi and y1,…,yj if xi = yj then LCS(i,j) = 1 + LCS(i-1,j-1)

15 Recursive Solution Sequences x1,…,xn, and y1,…,ym
LCS(i,j) = length of a longest common subsequence of x1,…,xi and y1,…,yj if xi  yj then LCS(i,j) = max (LCS(i-1,j),LCS(i,j-1)) xi and yj cannot both be in LCS

16 Recursive Solution Sequences x1,…,xn, and y1,…,ym
LCS(i,j) = length of a longest common subsequence of x1,…,xi and y1,…,yj if xi = yj then LCS(i,j) = 1 + LCS(i-1,j-1) if xi  yj then LCS(i,j) = max (LCS(i-1,j),LCS(i,j-1))

17 Recursive Solution Define c[i, j] = length of LCS of Xi and Yj .
We want c[m,n].

18 Constructing an LCS

19 LCS Example What is the Longest Common Subsequence of X and Y?
We’ll see how LCS algorithm works on the following example: X = ABCB Y = BDCAB What is the Longest Common Subsequence of X and Y? LCS(X, Y) = BCB X = A B C B Y = B D C A B

20 ABCB BDCAB X = ABCB; m = |X| = 4 Y = BDCAB; n = |Y| = 5
j i Yj B D C A B Xi A 1 B 2 3 C 4 B X = ABCB; m = |X| = 4 Y = BDCAB; n = |Y| = 5 Allocate array c[5,4]

21 ABCB BDCAB for i = 1 to m c[i,0] = 0 for j = 1 to n c[0,j] = 0
Yj B D C A B Xi A 1 B 2 3 C 4 B for i = 1 to m c[i,0] = 0 for j = 1 to n c[0,j] = 0

22 ABCB BDCAB j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 B 2 3 C if ( Xi == Yj )
A 1 B 2 3 C if ( Xi == Yj ) c[i,j] = c[i-1,j-1] + 1 b[i,j] = “ ” if ( c[i-1,j] >= c[i,j] ) c[i,j] = c[i-1,j] b[i,j] = “ ” else c[i,j] = c[i,j-1] 4 B

23 ABCB BDCAB j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 B 2 3 C if ( Xi == Yj )
A 1 B 2 3 C if ( Xi == Yj ) c[i,j] = c[i-1,j-1] + 1 b[i,j] = “ ” if ( c[i-1,j] >= c[i,j] ) c[i,j] = c[i-1,j] b[i,j] = “ ” else c[i,j] = c[i,j-1] 4 B

24 ABCB BDCAB j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 1 B 2 3 C
A 1 1 B 2 3 C if ( Xi == Yj ) c[i,j] = c[i-1,j-1] + 1 b[i,j] = “ ” if ( c[i-1,j] >= c[i,j] ) c[i,j] = c[i-1,j] b[i,j] = “ ” else c[i,j] = c[i,j-1] 4 B

25 ABCB BDCAB j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 1 1 B 2 3 C
A 1 1 1 B 2 3 C if ( Xi == Yj ) c[i,j] = c[i-1,j-1] + 1 b[i,j] = “ ” if ( c[i-1,j] >= c[i,j] ) c[i,j] = c[i-1,j] b[i,j] = “ ” else c[i,j] = c[i,j-1] 4 B

26 ABCB BDCAB j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 1 1 B 2 1 3 C
A 1 1 1 B 2 1 3 C if ( Xi == Yj ) c[i,j] = c[i-1,j-1] + 1 b[i,j] = “ ” if ( c[i-1,j] >= c[i,j] ) c[i,j] = c[i-1,j] b[i,j] = “ ” else c[i,j] = c[i,j-1] 4 B

27 ABCB BDCAB j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 1 1 B 2 1 1 1 3 C
A 1 1 1 B 2 1 1 1 3 C if ( Xi == Yj ) c[i,j] = c[i-1,j-1] + 1 b[i,j] = “ ” if ( c[i-1,j] >= c[i,j] ) c[i,j] = c[i-1,j] b[i,j] = “ ” else c[i,j] = c[i,j-1] 4 B

28 ABCB BDCAB j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 1 1 B 2 1 1 1 1 3 C
A 1 1 1 B 2 1 1 1 1 3 C if ( Xi == Yj ) c[i,j] = c[i-1,j-1] + 1 b[i,j] = “ ” if ( c[i-1,j] >= c[i,j] ) c[i,j] = c[i-1,j] b[i,j] = “ ” else c[i,j] = c[i,j-1] 4 B

29 ABCB BDCAB j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 1 1 B 2 1 1 1 1 2 3 C
A 1 1 1 B 2 1 1 1 1 2 3 C if ( Xi == Yj ) c[i,j] = c[i-1,j-1] + 1 b[i,j] = “ ” if ( c[i-1,j] >= c[i,j] ) c[i,j] = c[i-1,j] b[i,j] = “ ” else c[i,j] = c[i,j-1] 4 B

30 ABCB BDCAB j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 1 1 B 2 1 1 1 1 2 3 C
A 1 1 1 B 2 1 1 1 1 2 3 C if ( Xi == Yj ) c[i,j] = c[i-1,j-1] + 1 b[i,j] = “ ” if ( c[i-1,j] >= c[i,j] ) c[i,j] = c[i-1,j] b[i,j] = “ ” else c[i,j] = c[i,j-1] 1 1 4 B

31 ABCB BDCAB j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 1 1 B 2 1 1 1 1 2 3 C
A 1 1 1 B 2 1 1 1 1 2 3 C if ( Xi == Yj ) c[i,j] = c[i-1,j-1] + 1 b[i,j] = “ ” if ( c[i-1,j] >= c[i,j] ) c[i,j] = c[i-1,j] b[i,j] = “ ” else c[i,j] = c[i,j-1] 1 1 2 4 B

32 ABCB BDCAB j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 1 1 B 2 1 1 1 1 2 3 C
A 1 1 1 B 2 1 1 1 1 2 3 C if ( Xi == Yj ) c[i,j] = c[i-1,j-1] + 1 b[i,j] = “ ” if ( c[i-1,j] >= c[i,j] ) c[i,j] = c[i-1,j] b[i,j] = “ ” else c[i,j] = c[i,j-1] 1 1 2 2 4 B

33 ABCB BDCAB j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 1 1 B 2 1 1 1 1 2 3 C
A 1 1 1 B 2 1 1 1 1 2 3 C if ( Xi == Yj ) c[i,j] = c[i-1,j-1] + 1 b[i,j] = “ ” if ( c[i-1,j] >= c[i,j] ) c[i,j] = c[i-1,j] b[i,j] = “ ” else c[i,j] = c[i,j-1] 1 1 2 2 2 4 B

34 ABCB BDCAB j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 1 1 B 2 1 1 1 1 2 3 C
A 1 1 1 B 2 1 1 1 1 2 3 C if ( Xi == Yj ) c[i,j] = c[i-1,j-1] + 1 b[i,j] = “ ” if ( c[i-1,j] >= c[i,j] ) c[i,j] = c[i-1,j] b[i,j] = “ ” else c[i,j] = c[i,j-1] 1 1 2 2 2 4 B 1

35 ABCB BDCAB j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 1 1 B 2 1 1 1 1 2 3 C
A 1 1 1 B 2 1 1 1 1 2 3 C if ( Xi == Yj ) c[i,j] = c[i-1,j-1] + 1 b[i,j] = “ ” if ( c[i-1,j] >= c[i,j] ) c[i,j] = c[i-1,j] b[i,j] = “ ” else c[i,j] = c[i,j-1] 1 1 2 2 2 4 B 1 1 2 2

36 3 ABCB BDCAB j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 1 1 B 2 1 1 1 1 2 3 C
A 1 1 1 B 2 1 1 1 1 2 3 C if ( Xi == Yj ) c[i,j] = c[i-1,j-1] + 1 b[i,j] = “ ” if ( c[i-1,j] >= c[i,j] ) c[i,j] = c[i-1,j] b[i,j] = “ ” else c[i,j] = c[i,j-1] 1 1 2 2 2 3 4 B 1 1 2 2

37 Computing the Length of an LCS

38 Analysis of an LCS O(m*n) Running time and memory: O(mn) and O(mn).
since each c[i,j] is calculated in constant time, and there are m*n elements in the array Running time and memory: O(mn) and O(mn).

39 How to find actual LCS 3 j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 1 1 2 B 1
Xi A 1 1 1 2 B 1 1 1 1 2 3 C 1 1 2 2 2 4 3 B 1 1 2 2

40 Finding LCS 3 j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 1 1 B 2 1 1 1 1 2 3
A 1 1 1 B 2 1 1 1 1 2 3 C 1 1 2 2 2 3 4 B 1 1 2 2

41 How to find actual LCS 3 j 0 1 2 3 4 5 i Yj B D C A B Xi A 1 1 1 2 B 1
Xi A 1 1 1 2 B 1 1 1 1 2 3 C 1 1 2 2 2 4 3 B 1 1 2 2 Initial call is PRINT-LCS (b, X,m, n). When b[i, j ] = , we have extended LCS by one character. So LCS = entries with in them. Time: O(m+n) (no of characters to be checked)

42 3 B C B LCS (reversed order): B C B
j i Yj B D C A B Xi A 1 1 1 B 2 1 1 1 1 2 3 C 1 1 2 2 2 3 4 B 1 1 2 2 B C B LCS (reversed order): B C B (this string turned out to be a palindrome) LCS (straight order):

43 LCS Example What is the Longest Common Subsequence of X and Y?
We’ll see how LCS algorithm works on the following example: X = PRESIDENT Y = PROVIDENCE What is the Longest Common Subsequence of X and Y?

44

45 Output: PRIDEN


Download ppt "Longest common subsequence (LCS)"

Similar presentations


Ads by Google