Approximate On-line Palindrome Recognition, and Applications Amihood Amir Benny Porat.

Slides:



Advertisements
Similar presentations
Part VI NP-Hardness. Lecture 23 Whats NP? Hard Problems.
Advertisements

Lecture 9. Resource bounded KC K-, and C- complexities depend on unlimited computational resources. Kolmogorov himself first observed that we can put resource.
Shortest Vector In A Lattice is NP-Hard to approximate
Theory of Computing Lecture 23 MAS 714 Hartmut Klauck.
Incremental Linear Programming Linear programming involves finding a solution to the constraints, one that maximizes the given linear function of variables.
WSPD Applications.
Lecture 24 MAS 714 Hartmut Klauck
Longest Common Subsequence
Analysis of Algorithms
Greedy Algorithms Amihood Amir Bar-Ilan University.
Theory of Computing Lecture 3 MAS 714 Hartmut Klauck.
Asynchronous Pattern Matching - Metrics Amihood Amir CPM 2006.
Parallel Scheduling of Complex DAGs under Uncertainty Grzegorz Malewicz.
Asynchronous Pattern Matching - Metrics Amihood Amir.
Complexity 7-1 Complexity Andrei Bulatov Complexity of Problems.
Complexity 16-1 Complexity Andrei Bulatov Non-Approximability.
Complexity 12-1 Complexity Andrei Bulatov Non-Deterministic Space.
Complexity 15-1 Complexity Andrei Bulatov Hierarchy Theorem.
Computability and Complexity 19-1 Computability and Complexity Andrei Bulatov Non-Deterministic Space.
CSE115/ENGR160 Discrete Mathematics 03/03/11 Ming-Hsuan Yang UC Merced 1.
Prune-and-search Strategy
Probabilistic Complexity. Probabilistic Algorithms Def: A probabilistic Turing Machine M is a type of non- deterministic TM, where each non-deterministic.
Submitted by : Estrella Eisenberg Yair Kaufman Ohad Lipsky Riva Gonen Shalom.
Of Mice and Men Learning from genome reversal findings Genome Rearrangements in Mammalian Evolution: Lessons From Human and Mouse Genomes and Transforming.
Faster Algorithm for String Matching with k Mismatches Amihood Amir, Moshe Lewenstin, Ely Porat Journal of Algorithms, Vol. 50, 2004, pp Date.
Exact and Approximate Pattern in the Streaming Model Presented by - Tanushree Mitra Benny Porat and Ely Porat 2009 FOCS.
6/29/20151 Efficient Algorithms for Motif Search Sudha Balla Sanguthevar Rajasekaran University of Connecticut.
Orgad Keller Modified by Ariel Rosenfeld Less Than Matching.
String Matching with Mismatches Some slides are stolen from Moshe Lewenstein (Bar Ilan University)
Asynchronous Pattern Matching - Address Level Errors Amihood Amir Bar Ilan University 2010.
11 -1 Chapter 11 Randomized Algorithms Randomized algorithms In a randomized algorithm (probabilistic algorithm), we make some random choices.
Survey: String Matching with k Mismatches Moshe Lewenstein Bar Ilan University.
Ch. 8 & 9 – Linear Sorting and Order Statistics What do you trade for speed?
String Matching. Problem is to find if a pattern P[1..m] occurs within text T[1..n] Simple solution: Naïve String Matching –Match each position in the.
Merkle-Hellman Knapsack Cryptosystem Merkle offered $100 award for breaking singly - iterated knapsack Singly-iterated Merkle - Hellman KC was broken by.
2.3 Functions A function is an assignment of each element of one set to a specific element of some other set. Synonymous terms: function, assignment, map.
Genome Rearrangements Tseng Chiu Ting Sept. 24, 2004.
1 A Simpler 1.5- Approximation Algorithm for Sorting by Transpositions Combinatorial Pattern Matching (CPM) 2003 Authors: T. Hartman & R. Shamir Speaker:
Chapter 14 Randomized algorithms Introduction Las Vegas and Monte Carlo algorithms Randomized Quicksort Randomized selection Testing String Equality Pattern.
Closest String with Wildcards ( CSW ) Parameterized Complexity Analysis for the Closest String with Wildcards ( CSW ) Problem Danny Hermelin Liat Rozenberg.
Semi-Numerical String Matching. All the methods we’ve seen so far have been based on comparisons. We propose alternative methods of computation such as:
© The McGraw-Hill Companies, Inc., Chapter 6 Prune-and-Search Strategy.
11 -1 Chapter 11 Randomized Algorithms Randomized Algorithms In a randomized algorithm (probabilistic algorithm), we make some random choices.
On The Connections Between Sorting Permutations By Interchanges and Generalized Swap Matching Joint work of: Amihood Amir, Gary Benson, Avivit Levy, Ely.
CPSC 335 Randomized Algorithms Dr. Marina Gavrilova Computer Science University of Calgary Canada.
Length Reduction in Binary Transforms Oren Kapah Ely Porat Amir Rothschild Amihood Amir Bar Ilan University and Johns Hopkins University.
UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Bioinformatics Algorithms and Data Structures Chapter 1: Exact String Matching.
Faster Algorithm for String Matching with k Mismatches (II) Amihood Amir, Moshe Lewenstin, Ely Porat Journal of Algorithms, Vol. 50, 2004, pp
Merkle-Hellman Knapsack Cryptosystem
1 Prune-and-Search Method 2012/10/30. A simple example: Binary search sorted sequence : (search 9) step 1  step 2  step 3  Binary search.
Real time pattern matching Porat Benny Porat Ely Bar-Ilan University.
Complexity, etc. Homework. Comparison to computability. Big Oh notation. Sorting. Classwork/Homework: prepare presentation on specific sorts. Presentation.
Strings Basic data type in computational biology A string is an ordered succession of characters or symbols from a finite set called an alphabet Sequence.
On the Hardness of Optimal Vertex Relabeling and Restricted Vertex Relabeling Amihood Amir Benny Porat.
CSC317 1 Quicksort on average run time We’ll prove that average run time with random pivots for any input array is O(n log n) Randomness is in choosing.
2.6 APPLICATIONS OF INDUCTION & OTHER IDEAS IMPORTANT THEOREMS MIDWESTERN STATE UNIVERSITY – COMPUTER SCIENCE.
Dipankar Ranjan Baisya, Mir Md. Faysal & M. Sohel Rahman CSE, BUET Dhaka 1000 Degenerate String Reconstruction from Cover Arrays (Extended Abstract) 1.
A new matching algorithm based on prime numbers N. D. Atreas and C. Karanikas Department of Informatics Aristotle University of Thessaloniki.
Discrete Mathematics Chapter 2 The Fundamentals : Algorithms, the Integers, and Matrices. 大葉大學 資訊工程系 黃鈴玲.
Single Machine Scheduling Problem Lesson 5. Maximum Lateness and Related Criteria Problem 1|r j |L max is NP-hard.
Amihood Amir, Gary Benson, Avivit Levy, Ely Porat, Uzi Vishne
Part VI NP-Hardness.
HIERARCHY THEOREMS Hu Rui Prof. Takahashi laboratory
Introduction to the Analysis of Complexity
Fast Fourier Transform
3. Brute Force Selection sort Brute-Force string matching
Lecture 8. Paradigm #6 Dynamic Programming
3. Brute Force Selection sort Brute-Force string matching
Complexity Theory: Foundations
3. Brute Force Selection sort Brute-Force string matching
Presentation transcript:

Approximate On-line Palindrome Recognition, and Applications Amihood Amir Benny Porat

Moskva River

Confluence of 4 Streams Palindrome Recognition Approximate Matching Interchange Matching Online Algorithms CPM 2014

Palindrome Recognition - Voz'mi-ka slovo ropot, - govoril Cincinnatu ego shurin, ostriak, -- I prochti obratno. A? Smeshno poluchaetsia? Vladimir Nabokov, Invitation to a Beheading (1) "Take the word ropot [murmur]," Cincinnatus' brother-in-law, the wit, was saying to him, "and read it backwards. Eh? Comes out funny, doesn't it?" [--› topor: the axe] A palindrome is a string that is the same whether read from right to left or from left to right: Examples: доход доход A man, a plan, a cat, a ham, a yak, a yam, a hat, a canal-Panama!

Palindrome Example Ibn Ezra: Medieval Jewish philosopher, poet, Biblical commentator, and mathematician. Was asked: " אבי אל חי שמך למה מלך משיח לא יבא " [ My Father, the Living God, why does the king messiah not arrive?] His response: "דעו מאביכם כי לא בוש אבוש, שוב אשוב אליכם כי בא מועד" [ Know you from your Father that I will not be delayed. I will return to you when the time will come ]

Palindromes in Computer Science Great programming exercise in CS 101. Example of a problem that can be solved by a RAM in linear time, but not by a 1-tape Turing machine. (Can be done in linear time by a 2-tape TM)

Palindrome Concatenation We may be interested in finding out whether a string is a concatenation of palindromes of length > 1. Example: ABCCBABBCCBCAACB Why would we be interested in such a funny problem? – we’ll soon see Exercise: Do this in linear time… ABCCBABBCCBCAACB

Stream 2 - Approximations As in exact matching, there may be errors. Find the minimum number of errors that, if fixed, will give a string that is a concatenation of palindromes of length > 1 Example: ABCCBCBBCCBCABCB For Hamming distance: A-Porat [ISAAC 13]: Algorithm of time O(n 2 ) ABCCBABBCCBCAACB

Stream 3 - Reversals Why is this funny problem interesting? Sorting by reversals: In the evolutionary process a substring may “detach” and “reconnect” in reverse: ABCABCDAABCBADCBAADCB ABCABCDAABCBAD

Sorting by Reversals What is the minimum number of reversals that, when applied to string A, result in string B? History: Introduced: Bafna & Pevzner [95] NP-hard: Carpara [97] Approximations: Christie [98] Berman, Hannenhalli, Karpinski [02] Hartman [03]

Sorting by Reversals – Polynomial time Relaxations Signed reversals: Hannenhalli & Pevzner [99] Kaplan, Shamir, Tarjan [00] Tannier & Sagot [04]... Disjointness: Swap Matching Muthu [96] Two constraints: 1. The length of the reversed substring is limited to All swaps are disjoint.

Reversal Distance (RD): – The RD between s 1 and s 2 is the minimum number k, such that there exist s 2 ’, where HAM(s 1,s 2 ’) =k, and s 1 reversal match s 2. ABDEABCDA ECBABAADA S1:S1: S2:S2: RD(S 1,S 2 ) = 2 Pattern Matching with Disjoint Reversals

Interleave Strings: ABDEABCDA EDBABAADC S1:S1: S2:S2: Connection between Reversal Matching and Palindrome Matching A C D D C A B A A B E A D B B D A E

On-line Input Suppose that we get the input a byte at a time: For the palindrome problem: ACDACABBAAEBB A A A EADDD

On-line Input Suppose that we get the input a byte at a time: For the reversal problem: ACAC CACA BABA ABAB EAEA BDBD A A A AEAED DBDB

Main Idea – Palindrome Fingerprint s 0,s 1,s 2,…s m-1 Φ R (S)=r -1 s 0 + r -2 s 1 +… r -m s m-1 mod (p) Φ (S)=r 1 s 0 + r 2 s 1 +… r m s m-1 mod (p) The Rabin Karp Fingerprint If r m+1 Φ R (S) = Φ(S) => S is a palindrome. w.h.p. The Reversal Fingerprint

Palindrome Fingerprint If r m+1 Φ R (S) = Φ(S) => S is a palindrome. Example: S = A B C B A r 6 Φ R (S)= r 6 (1/r A + 1/r 2 B + 1/r 3 C + 1/r 4 B + 1/r 5 A) = r 5 A + r 4 B + r 3 C + r 2 B + r A = Φ(S) Φ R (S)=r -1 s 0 + r -2 s 1 +… r -m s m-1 mod (p) Φ (S)=r 1 s 0 + r 2 s 1 +… r m s m-1 mod (p)

Simple Online Algorithm for Finding a Palindrome in a Text t 1,t 2,t 3, … t i,t i+1,t i+2, … t i+m, t i+m+1, … t n Φ R =r -1 t i + r -2 t i+1 +… r -m t i+m mod (p) Φ =r 1 t i + r 2 t i+1 +… r m t i+m mod (p) If not, then for the next position: If r m+1 Φ R = Φ = > there is a palindrome starting in the i-th position. Φ = Φ + r m+1 t i+m+1 mod (p) Φ R = Φ R + r -(m+1) t i+m+1 mod (p) Note: This algorithm finds online whether the prefix of a text is a permutation. For finding online whether the text is a concatenation of permutations, assume even-length permutations, otherwise, every text is a concatenation of length-1 permutations.

Palindrome with mismatches Start with 1 mismatch case.

1-Mismatch s 0,s 1,s 2, … s m-1 S= Choose l prime numbers q 1,…,q l < m such that

1-Mismatch s 0,s 1,s 2, … s m-1 s 0,s 2,s 4 … s 1 s 3,s 5 … s 0,s 3,s 6 … s 1,s 4,s 7 … s 2,s 5,s 8 … mod 2 mod 3 S= S 2,0 = S 2,1 = S 3,0 = S 3,1 = S 3,2 = For each q i construct q i subsequences of S as follows: subsequence S q i,j is all elements of S whose index is j mod q i. Examples: q 1 =2, q 2 =3

Example s 0,s 1,s 2, s 3,s 4,s 5 s 0,s 2,s 4 s 1 s 3,s 5 s 0,s 3 s 1,s 4 s 2,s 5 mod 2 mod 3 S= S 2,0 = S 2,1 = S 3,0 = S 3,1 = S 3,2 =

1-Mismatch We need to compare: We prove that in the partitions strings: s 0, s 1, s 2, … s m-2,s m-1 s m-1, s m-2, s m-3 … s 1, s 0 S q,j = S R q,(m-1-j)mod q

Example s 0,s 1,s 2,s 3,s 4,s 5 s 0,s 2,s 4 s 0,s 3 s 1,s 4 S= s 5,s 4,s 3,s 2,s 1,s 0 SR=SR= s 5 s 3,s 1 s 5,s 2 s 4,s 1 s 0,s 2,s 4 s 1 s 3,s 5 s 0,s 3 s 1,s 4 s 2,s 5 S 2,0 = S 2,1 = S 3,0 = S 3,1 = S 3,2 = S 2,0 = S R 2,1 = S 3,0 = S R 3,2 = S 3,1 = S R 3,1 =

Exact Matching Lemma: S=S R  S q,j = S R q,(m-1-j) mod q for all q and all 0 ≤ j ≤ q.

1-Mismatch Lemma: S is a palindrome with 1-mismatch  for each q, there is exactly one j such that: Φ (S q,j ) ≠ r |Sq,j| Φ R (S R q,(m-1-j)mod q )

1-Mismatch Lemma: There is exactly one mismatch There is exactly one subpattern in each group that does not match. C.R.T

Chinese Remainder Theorem Let n and m two positive integers. In our case: if two different indices, i and j, have an error, and only one subsequence is erroneous, since the product of all q’s > m, it means that i=j.

Complexity There exists a constant c such that, for any x<m, there are at least x/log m prime numbers between x and cx. Therefore, choose prime numbers between log m and c log m.

Complexity For each q i we compute 2q i different fingerprints: Overall space: Each character participates in exactly two fingerprints (the regular and the reverse). Overall time:

Online All fingerprint calculations can be done online We know the m at every input character, to compute the comparisons. Conclude: Our algorithm is online.

k-Mismatches Use Group testing…

k-Mismatches Group Testing Given n items with some positive ones, identify all positive ones by a small number of tests. Each test is on a subset of items. Test outcome is positive iff there is a positive item in the subset.

k-Mismatch Group: partition of the text. Test: distinguish between: (using the 1-mismatch algorithm) – match – 1-mismatch – more then 1-mismatch

k-Mismatches s 0,s 1,s 2, … s m-1 s 0,s 2,s 4 … s 1 s 3,s 5 … s 0,s 3,s 6 … s 1,s 4,s 7 … s 2,s 5,s 8 … mod 2 mod 3 S= S 2,0 = S 2,1 = S 3,0 = S 3,1 = S 3,2 = Similar to the 1-mismatch algorithm just with more prime numbers… Each S q,j is a group in our group testing

Our tests We define The reversal pair of S q,j to be S R q,(m-1-j)mod q Each partition is “tested against” its reversal pair.

Correctness s 0,s 1,s 2, … s j …. s m-1 For any group of k character i 1,i 2,..i k There exists a partition where s j appears alone i2i2 i5i5 i7i7 i9i9 i C.R.T

Correctness s 0,s 1,s 2, … s j …. s m-1 If s j invokes a mismatch we will catch it. i2i2 i5i5 i7i7 i9i9 i

Complexity Overall space: Overall time:

Approximate Reversal Distance Using the palindrome up to k-mismatches algorithm, can be solved in time, and space.

спасибо