Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dynamic Programming: Edit Distance

Similar presentations


Presentation on theme: "Dynamic Programming: Edit Distance"— Presentation transcript:

1 Dynamic Programming: Edit Distance

2 Aligning Sequences without Insertions and Deletions: Hamming Distance
Given two sequences v and w : v : A T w : A T The Hamming distance: dH(v, w) = 8 is large but the sequences are very similar

3 Aligning Sequences with Insertions and Deletions
By shifting one sequence over one position: v : A T -- w : -- A T The edit distance: dH(v, w) = 2. Hamming distance neglects insertions and deletions in the sequences

4 Edit Distance Levenshtein (1966) introduced edit distance between two strings as the minimum number of elementary operations (insertions, deletions, and substitutions) to transform one string into the other d(v,w) = MIN number of elementary operations to transform v  w

5 Edit Distance vs Hamming Distance
always compares i-th letter of v with i-th letter of w V = ATATATAT W = TATATATA Hamming distance: d(v, w)=8 Computing Hamming distance is a trivial task.

6 Edit Distance vs Hamming Distance
may compare i-th letter of v with j-th letter of w Hamming distance always compares i-th letter of v with i-th letter of w V = - ATATATAT V = ATATATAT W = TATATATA W = TATATATA Hamming distance: Edit distance: d(v, w)= d(v, w)=2 (one insertion and one deletion) How to find what j goes with what i ???

7 Edit Distance: Example
TGCATAT  ATCCGAT in 5 steps TGCATAT  (delete last T) TGCATA  (delete last A) TGCAT  (insert A at front) ATGCAT  (substitute C for 3rd G) ATCCAT  (insert G before last A) ATCCGAT (Done)

8 Alignment as a Path in the Edit Graph
1 2 3 4 5 6 7 G A T C w v A T _ G T T A T _ A T C G T _ A _ C (0,0) , (1,1) , (2,2), (2,3), (3,4), (4,5), (5,5), (6,6), (7,6), (7,7) - Corresponding path -

9 Alignment as a Path in the Edit Graph
1 2 3 4 5 6 7 G A T C w v Old Alignment v= AT_GTTAT_ w= ATCGT_A_C New Alignment v= AT_GTTAT_ w= ATCG_TA_C

10 Dynamic programming (Cormen et al.)
Optimal substructure: The optimal solution to the problem contains within it optimal solutions to subproblems. Overlapping subproblems: The optimal solutions to subproblems (“subsolutions”) overlap. These subsolutions are computed over and over again when computing the global optimal solution. Optimal substructure: We compute minimum distance of substrings in order to compute the minimum distance of the entire string. Overlapping subproblems: Need most distances of substrings 3 times (moving right, diagonally, down)

11 { Dynamic Programming si,j = si-1, j-1+ (vi != wj) min si-1, j +1

12 Levenshtein distance: Computation

13 Levenshtein distance: algorithm


Download ppt "Dynamic Programming: Edit Distance"

Similar presentations


Ads by Google