Presentation is loading. Please wait.

Presentation is loading. Please wait.

Least common subsequence:

Similar presentations


Presentation on theme: "Least common subsequence:"— Presentation transcript:

1 Least common subsequence:
Biological applications often need to compare the DNA of two (or more) different organisms. S1 = {ACCGGTCGAGTGCGCGGAAGCCGGCCGAA} S2 = {GTCGTTCGGAATGCCGTTGCTCTGTAAA} none are substrings of each other  we could say that both are similar if the number of changes to turn one into the other is small Find a 3rd strand in which the letters in S3 appear in S1 and S2 in the same order but not necessarily consecutively. CSC317

2 CSC317 More formally: Given a sequence , another sequence is a
subsequence of X if there exists a subsequence of indices of X such that for all j = 1,2,…,k we have Example: is a subsequence of with index sequence CSC317

3 More formally: Given two sequences X and Y we say that a sequence Z is a common subsequence of X and Y if Z is a subsequence of both X and Y Example: If and the sequence is a common subsequence of X and Y. Is it the longest subsequence? No longest common subsequences CSC317

4 In the longest-common-subsequence problem, we are given two sequences
and and want to find a maximum length common subsequence of X and Y . Great. How are we going to do that using dynamic programming??? Steps: Characterizing a longest common subsequence Recursive solution Computing the length of LCS Constructing a LCS CSC317

5 CSC317 Step 1: Characterizing a longest common subsequence
Brute force solution: We simply enumerate all subsequences of X and check each subsequence to see whether it is also a subsequence of Y , keeping track of the longest subsequence we find  Actually  because we would need to run through 2m subsequences … (sucks) But, does the LCS problem have an optimal-substructure property (dynamic programming, anyone)? Some definitions: Given a sequence we define the ith prefix of X as . Example: if , CSC317

6 CSC317 Theorem (no proof): Optimal substructure of an LCS
Let and be two sequences and be any LCS of X and Y then: 1.) if xm = yn, then zk= xm = ym and Zk-1 is and LCS of Xm-1 and Yn-1 2.) if xm ≠ yn, then zk ≠ xm implies that Z is a LCS of Xm-1 and Y. 3.) if xm ≠ yn, then zk ≠ yn implies that Z is a LCS of X and Yn-1. But what does that tell us? A LCS of two sequences contains within it an LCS of prefixes of the two sequences. Thus, the LCS problem has an optimal-substructure property, meaning we could use a recursive solution! CSC317

7 CSC317 Step 2: A recursive algorithm
To find an LCS of X and Y , we may need to find the LCSs of X and Yn-1 and of Xm-1 and Y . Furthermore, each of these subproblems has the subsubproblem of finding an LCS of Xm-1 and Yn-1. c[i,j] is the length of an LCS of the sequences Xi and Yj. The optimal substructure of the LCS problem gives the recursive formula if i = 0 or j = 0 if i, j > 0 and xi = yj if i, j > 0 and xi ≠ yj CSC317

8 Step 3: Computing the length of a LCS
CSC317

9 Step 3: Computing the length of a LCS
AB C BDAB BDCAB A BCBA CSC317

10 AB C BDAB BCBA BDCAB A CSC317
Step 4: Constructing a LCS (Backtracking) AB C BDAB BDCAB A BCBA CSC317


Download ppt "Least common subsequence:"

Similar presentations


Ads by Google