Presentation is loading. Please wait.

Presentation is loading. Please wait.

Approximate Matching of Run-Length Compressed Strings

Similar presentations


Presentation on theme: "Approximate Matching of Run-Length Compressed Strings"— Presentation transcript:

1 Approximate Matching of Run-Length Compressed Strings
Algorithmica (2003) Veli M¨akinen, Gonzalo Navarro, and Esko Ukkonen

2 Run-Length encoding aaabb (a,3),(b,2)
Edit Distance on Run-Length Compressed Strings Extending to Weighted Edit Distance Approximate Searching Improving a Greedy Algorithm for LCS

3 Part1: An O(mn’+m’n) Algorithm for the Levenshtein Distance
String A=a1a2 · · · am compressed length m’ String B=b1b2 · · · bn compressed length n’ Levenshtein distance, DL(A , B) di, j = min(di-1, j + 1, di, j-1 + 1, di-1, j-1 + if ai = bj then 0 else 1) DID(A, B) di, j = min(di-1, j + 1, di, j-1 + 1, di-1, j-1 + if ai = bj then 0 else ∞) Use Dynamic Programming

4 Relationship between DID and LCS
2 ×|LCS(A, B)| = m+n -DID(A, B) m + n = 2 ×|LCS(A, B)|+ x + y DID(A, B) = x + y

5 Notations

6 Known: Top and Left border Goal: Right and Button border
Equal letter box:

7 Different letter box: Observation: consecutive cells in the (dij) matrix differ at most by one

8 Algorithm:

9 Time Complexity of the Algorithm

10 Part2: Extending to Weighted Edit Distance

11 Which one is correct? or

12 path(d, r ) = Cs min(d, r ) + Cd max(d - r, 0) + Ci max(r - d, 0),
s-t (q,0) s-q (s,t) t path(d, r ) = Cs min(d, r ) + Cd max(d - r, 0) + Ci max(r - d, 0), r r Cs Cs d Cs d Ci Cd d<r d=r d>r

13 How to evaluate min value in constant time
The problem is, path is not a constant any more +Cs-Ci (s1,t1) (s3,t3) (s2,t2) +Cs-Cd (s4,t4)

14 Part3: Approximate Searching
Find all approximate occurrence of A(short pattern) in B(long string) Let all d0,j=0 and find all dm,j≦k More efficient approach — evaluate only the first m columns in each long run

15 Time Complexity Short run in B with length r≦m: O(m’r+m)
Long run: O(m’m+m+m) Total time complexity is O(n’m’m+R), R = number of occurence

16 Part4: Improving a Greedy Algorithm for LCS
Basic idea: Fill the only corner of the boxes Different letter box: ←x→ +s +t

17 Equal letter box: Recursively tracing an optimal path
Time complexity of tracing a path is O(m’+n’) The algorithm takes O(m’n’(m’+n’))

18 Analysis of Time Complexity
Observation: each cell in the borders of the boxes can be visited only once Also achieve O(m’n+n’m) bound Time complexity is O(min(m’n’(m’+n’), m’n+n’m)) Space complexity is O(m’n’)


Download ppt "Approximate Matching of Run-Length Compressed Strings"

Similar presentations


Ads by Google