Presentation on theme: "An Extension of the String-to- String Correction Problem Roy Lowrance and Robert A. Wagner Journal of the ACM, vol. 22, No. 2, April 1975, pp. 177-183."— Presentation transcript:
An Extension of the String-to- String Correction Problem Roy Lowrance and Robert A. Wagner Journal of the ACM, vol. 22, No. 2, April 1975, pp. 177-183. Speaker:
Edit Distance Three edit operations: –Substitution abcd -> aacd ( change b to a ) –Insertion abcd -> abacd ( insert an a ) –Deletion abcd -> abd ( delete c ) Given two strings T and P, The problem is to determine the minimum number of edit operations to transform T into P. Note: For clarity, we consider the cost of all edit operations are same.
saturday 012345678 s 101234567 u 211223456 n 322233456 d 433334345 a 543444434 y 654455543 saturday sunday d[i, j] = min( d[i-1, j] + 1, d[i, j-1] + 1, d[i-1, j-1] + cost(A[i]->B[j]) ) This example is copied from Wikipedia
The Problem This paper extends the set of edit operations to include the operation of interchanging two adjacent characters. –Swap Example: T: a b c d P: c d a a b c d -> a c d -> c a d -> c d a
Trace A trace is a graphical specification of how edit operations apply to each character in the two strings. Example: T: a b c d P: c d a
Important Properties The edit operations in following cases can be substituted by other edit operations. abc bca a...b b c a a b b a
abc bca a b b c a a b b a abc bca a b b c a b b c 2 swaps insertion + deletion deletion + substitution 2 substitution swap + substitution swap + K deletion + L insertion a...a b b a K L a trace with lower cost or
Summary With a simple preprocessing on |T| and |P|, then the problem can be solved by dynamic programming in time O(|T| |P|). If we allow edit operations to have different cost Insertion (cost W I ) Deletion (cost W D ) Swap (cost W S ) Substitution (cost W C ) then the algorithm works if 2 W S W I + W D.