Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Sequence Alignment Input: two sequences over the same alphabet Output: an alignment of the two sequences Example: u GCGCATGGATTGAGCGA u TGCGCCATTGATGACCA.

Similar presentations


Presentation on theme: "1 Sequence Alignment Input: two sequences over the same alphabet Output: an alignment of the two sequences Example: u GCGCATGGATTGAGCGA u TGCGCCATTGATGACCA."— Presentation transcript:

1 1 Sequence Alignment Input: two sequences over the same alphabet Output: an alignment of the two sequences Example: u GCGCATGGATTGAGCGA u TGCGCCATTGATGACCA A possible alignment: -GCGC-ATGGATTGAGCGA TGCGCCATTGAT-GACC-A cc: shlomo moran

2 2 Alignments -GCGC-ATGGATTGAGCGA TGCGCCATTGAT-GACC-A Three elements: u Perfect matches u Mismatches u Insertions & deletions (indel) cc: shlomo moran

3 3 Choosing Alignments There are many possible alignments For example, compare: -GCGC-ATGGATTGAGCGA TGCGCCATTGAT-GACC-A to ------GCGCATGGATTGAGCGA TGCGCC----ATTGATGACCA-- Which one is better? cc: shlomo moran

4 4 Alignments Costs  Replacements: one letter replaced by another  Deletion: deletion of a letter  Insertion: insertion of a letter u A cost of sequence similarity should examine how many and which operations took place cc: shlomo moran

5 5 Cost Function u We define a cost function by specifying a function  (x,y) is the cost of replacing x by y  (x,-) is the cost of deleting x  (-,x) is the cost of inserting x u The cost of an alignment is the sum of position costs cc: shlomo moran

6 6 Simple Cost Function Cost of each position: u Match: 0 u Mismatch: 1 u Indel 2 cc: shlomo moran

7 7 The Optimal Cost  The distance between two sequences is the minimal cost of all alignments of these sequences, namely, cc: shlomo moran

8 8 Recursive Formula for optimal cost Consider any optimal alignment of two sequences: s[1..m+1] and t[1..n+1] The last column in that alignment must be one of : 1. ( s[m+1],t[n +1] ) 2. ( s[m +1], - ) 3. ( -, t[n +1] ) cc: shlomo moran

9 9 Recursive Formula Consider any optimal alignment of two sequences: s[1..m+1] and t[1..n+1] The last column in that alignment must be one of : 1. Last match is ( s[m+1],t[n +1] ) 2. Last match is ( s[m +1], - ) 3. Last match is ( -, t[n +1] ) cc: shlomo moran

10 10 Recursive Formula Consider any optimal alignment of two sequences: s[1..m+1] and t[1..n+1] The last column in that alignment must be one of : 1. Last match is ( s[m+1],t[n +1] ) 2. Last match is ( s[m +1], - ) 3. Last match is ( -, t[n +1] ) cc: shlomo moran

11 11 Recursive Formula Define a Matrix V:  Using our recursive formula, we get the following recurrence for V : V[i,j]V[i,j+1] V[i+1,j]V[i+1,j+1] cc: shlomo moran

12 12 Recursive Formula u Of course, we also need to handle the base cases in the recursion: AA - We fill the matrix using the recurrence rule: S T versus cc: shlomo moran

13 13 Dynamic Programming Algorithm We continue to fill the matrix using the recurrence rule S T cc: shlomo moran

14 14 Dynamic Programming Algorithm V[0,0]V[0,1] V[1,0]V[1,1] 0 2 -A A- 2 (A- versus -A) versus S T cc: shlomo moran

15 15 Dynamic Programming Algorithm S T cc: shlomo moran

16 16 Dynamic Programming Algorithm Conclusion: d( AAAC, AGC ) = 3 S T cc: shlomo moran

17 17 Reconstructing the Best Alignment u To reconstruct the best alignment, we record which case(s) in the recursive rule minimized the cost S T cc: shlomo moran

18 18 Reconstructing the Best Alignment u We now trace back a path that corresponds to the best alignment AAAC AG-C S T cc: shlomo moran

19 19 Reconstructing the Best Alignment u Sometimes, more than one alignment has minimal cost S T AAAC A-GC AAAC -AGC AAAC AG-C cc: shlomo moran

20 20 Time Complexity Space: O(mn) Time: O(mn)  Filling the matrix O(mn)  Backtrack O(m+n) S T cc: Shlomo Moran


Download ppt "1 Sequence Alignment Input: two sequences over the same alphabet Output: an alignment of the two sequences Example: u GCGCATGGATTGAGCGA u TGCGCCATTGATGACCA."

Similar presentations


Ads by Google