# Approximate On-line Palindrome Recognition, and Applications Amihood Amir Benny Porat.

## Presentation on theme: "Approximate On-line Palindrome Recognition, and Applications Amihood Amir Benny Porat."— Presentation transcript:

Approximate On-line Palindrome Recognition, and Applications Amihood Amir Benny Porat

Moskva River

Confluence of 4 Streams Palindrome Recognition Approximate Matching Interchange Matching Online Algorithms CPM 2014

Palindrome Recognition - Voz'mi-ka slovo ropot, - govoril Cincinnatu ego shurin, ostriak, -- I prochti obratno. A? Smeshno poluchaetsia? Vladimir Nabokov, Invitation to a Beheading (1) "Take the word ropot [murmur]," Cincinnatus' brother-in-law, the wit, was saying to him, "and read it backwards. Eh? Comes out funny, doesn't it?" [--› topor: the axe] A palindrome is a string that is the same whether read from right to left or from left to right: Examples: доход доход A man, a plan, a cat, a ham, a yak, a yam, a hat, a canal-Panama!

Palindrome Example Ibn Ezra: Medieval Jewish philosopher, poet, Biblical commentator, and mathematician. Was asked: " אבי אל חי שמך למה מלך משיח לא יבא " [ My Father, the Living God, why does the king messiah not arrive?] His response: "דעו מאביכם כי לא בוש אבוש, שוב אשוב אליכם כי בא מועד" [ Know you from your Father that I will not be delayed. I will return to you when the time will come ]

Palindromes in Computer Science Great programming exercise in CS 101. Example of a problem that can be solved by a RAM in linear time, but not by a 1-tape Turing machine. (Can be done in linear time by a 2-tape TM)

Palindrome Concatenation We may be interested in finding out whether a string is a concatenation of palindromes of length > 1. Example: ABCCBABBCCBCAACB Why would we be interested in such a funny problem? – we’ll soon see Exercise: Do this in linear time… ABCCBABBCCBCAACB

Stream 2 - Approximations As in exact matching, there may be errors. Find the minimum number of errors that, if fixed, will give a string that is a concatenation of palindromes of length > 1 Example: ABCCBCBBCCBCABCB For Hamming distance: A-Porat [ISAAC 13]: Algorithm of time O(n 2 ) ABCCBABBCCBCAACB

Stream 3 - Reversals Why is this funny problem interesting? Sorting by reversals: In the evolutionary process a substring may “detach” and “reconnect” in reverse: ABCABCDAABCBADCBAADCB ABCABCDAABCBAD

Sorting by Reversals What is the minimum number of reversals that, when applied to string A, result in string B? History: Introduced: Bafna & Pevzner [95] NP-hard: Carpara [97] Approximations: Christie [98] Berman, Hannenhalli, Karpinski [02] Hartman [03]

Sorting by Reversals – Polynomial time Relaxations Signed reversals: Hannenhalli & Pevzner [99] Kaplan, Shamir, Tarjan [00] Tannier & Sagot [04]... Disjointness: Swap Matching Muthu [96] Two constraints: 1. The length of the reversed substring is limited to 2. 2. All swaps are disjoint.

Reversal Distance (RD): – The RD between s 1 and s 2 is the minimum number k, such that there exist s 2 ’, where HAM(s 1,s 2 ’) =k, and s 1 reversal match s 2. ABDEABCDA ECBABAADA S1:S1: S2:S2: RD(S 1,S 2 ) = 2 Pattern Matching with Disjoint Reversals

Interleave Strings: ABDEABCDA EDBABAADC S1:S1: S2:S2: Connection between Reversal Matching and Palindrome Matching A C D D C A B A A B E A D B B D A E

On-line Input Suppose that we get the input a byte at a time: For the palindrome problem: ACDACABBAAEBB A A A EADDD

On-line Input Suppose that we get the input a byte at a time: For the reversal problem: ACAC CACA BABA ABAB EAEA BDBD A A A AEAED DBDB

Main Idea – Palindrome Fingerprint s 0,s 1,s 2,…s m-1 Φ R (S)=r -1 s 0 + r -2 s 1 +… r -m s m-1 mod (p) Φ (S)=r 1 s 0 + r 2 s 1 +… r m s m-1 mod (p) The Rabin Karp Fingerprint If r m+1 Φ R (S) = Φ(S) => S is a palindrome. w.h.p. The Reversal Fingerprint

Palindrome Fingerprint If r m+1 Φ R (S) = Φ(S) => S is a palindrome. Example: S = A B C B A r 6 Φ R (S)= r 6 (1/r A + 1/r 2 B + 1/r 3 C + 1/r 4 B + 1/r 5 A) = r 5 A + r 4 B + r 3 C + r 2 B + r A = Φ(S) Φ R (S)=r -1 s 0 + r -2 s 1 +… r -m s m-1 mod (p) Φ (S)=r 1 s 0 + r 2 s 1 +… r m s m-1 mod (p)

Simple Online Algorithm for Finding a Palindrome in a Text t 1,t 2,t 3, … t i,t i+1,t i+2, … t i+m, t i+m+1, … t n Φ R =r -1 t i + r -2 t i+1 +… r -m t i+m mod (p) Φ =r 1 t i + r 2 t i+1 +… r m t i+m mod (p) If not, then for the next position: If r m+1 Φ R = Φ = > there is a palindrome starting in the i-th position. Φ = Φ + r m+1 t i+m+1 mod (p) Φ R = Φ R + r -(m+1) t i+m+1 mod (p) Note: This algorithm finds online whether the prefix of a text is a permutation. For finding online whether the text is a concatenation of permutations, assume even-length permutations, otherwise, every text is a concatenation of length-1 permutations.

Palindrome with mismatches Start with 1 mismatch case.

1-Mismatch s 0,s 1,s 2, … s m-1 S= Choose l prime numbers q 1,…,q l < m such that

1-Mismatch s 0,s 1,s 2, … s m-1 s 0,s 2,s 4 … s 1 s 3,s 5 … s 0,s 3,s 6 … s 1,s 4,s 7 … s 2,s 5,s 8 … mod 2 mod 3 S= S 2,0 = S 2,1 = S 3,0 = S 3,1 = S 3,2 = For each q i construct q i subsequences of S as follows: subsequence S q i,j is all elements of S whose index is j mod q i. Examples: q 1 =2, q 2 =3

Example s 0,s 1,s 2, s 3,s 4,s 5 s 0,s 2,s 4 s 1 s 3,s 5 s 0,s 3 s 1,s 4 s 2,s 5 mod 2 mod 3 S= S 2,0 = S 2,1 = S 3,0 = S 3,1 = S 3,2 =

1-Mismatch We need to compare: We prove that in the partitions strings: s 0, s 1, s 2, … s m-2,s m-1 s m-1, s m-2, s m-3 … s 1, s 0 S q,j = S R q,(m-1-j)mod q

Example s 0,s 1,s 2,s 3,s 4,s 5 s 0,s 2,s 4 s 0,s 3 s 1,s 4 S= s 5,s 4,s 3,s 2,s 1,s 0 SR=SR= s 5 s 3,s 1 s 5,s 2 s 4,s 1 s 0,s 2,s 4 s 1 s 3,s 5 s 0,s 3 s 1,s 4 s 2,s 5 S 2,0 = S 2,1 = S 3,0 = S 3,1 = S 3,2 = S 2,0 = S R 2,1 = S 3,0 = S R 3,2 = S 3,1 = S R 3,1 =

Exact Matching Lemma: S=S R  S q,j = S R q,(m-1-j) mod q for all q and all 0 ≤ j ≤ q.

1-Mismatch Lemma: S is a palindrome with 1-mismatch  for each q, there is exactly one j such that: Φ (S q,j ) ≠ r |Sq,j| Φ R (S R q,(m-1-j)mod q )

1-Mismatch Lemma: There is exactly one mismatch There is exactly one subpattern in each group that does not match. C.R.T

Chinese Remainder Theorem Let n and m two positive integers. In our case: if two different indices, i and j, have an error, and only one subsequence is erroneous, since the product of all q’s > m, it means that i=j.

Complexity There exists a constant c such that, for any x { "@context": "http://schema.org", "@type": "ImageObject", "contentUrl": "http://images.slideplayer.com/14/4194047/slides/slide_29.jpg", "name": "Complexity There exists a constant c such that, for any x

Complexity For each q i we compute 2q i different fingerprints: Overall space: Each character participates in exactly two fingerprints (the regular and the reverse). Overall time:

Online All fingerprint calculations can be done online We know the m at every input character, to compute the comparisons. Conclude: Our algorithm is online.

k-Mismatches Use Group testing…

k-Mismatches Group Testing Given n items with some positive ones, identify all positive ones by a small number of tests. Each test is on a subset of items. Test outcome is positive iff there is a positive item in the subset.

k-Mismatch Group: partition of the text. Test: distinguish between: (using the 1-mismatch algorithm) – match – 1-mismatch – more then 1-mismatch

k-Mismatches s 0,s 1,s 2, … s m-1 s 0,s 2,s 4 … s 1 s 3,s 5 … s 0,s 3,s 6 … s 1,s 4,s 7 … s 2,s 5,s 8 … mod 2 mod 3 S= S 2,0 = S 2,1 = S 3,0 = S 3,1 = S 3,2 = Similar to the 1-mismatch algorithm just with more prime numbers… Each S q,j is a group in our group testing

Our tests We define The reversal pair of S q,j to be S R q,(m-1-j)mod q Each partition is “tested against” its reversal pair.

Correctness s 0,s 1,s 2, … s j …. s m-1 For any group of k character i 1,i 2,..i k There exists a partition where s j appears alone i2i2 i5i5 i7i7 i9i9 i C.R.T

Correctness s 0,s 1,s 2, … s j …. s m-1 If s j invokes a mismatch we will catch it. i2i2 i5i5 i7i7 i9i9 i

Complexity Overall space: Overall time:

Approximate Reversal Distance Using the palindrome up to k-mismatches algorithm, can be solved in time, and space.

спасибо

Download ppt "Approximate On-line Palindrome Recognition, and Applications Amihood Amir Benny Porat."

Similar presentations