Sorting suffixes of two-pattern strings F. Franek & W.F. Smyth Algorithms Research Group Computing and Software McMaster University Hamilton, Ontario Canada.

Slides:



Advertisements
Similar presentations
You have been given a mission and a code. Use the code to complete the mission and you will save the world from obliteration…
Advertisements

Mathematical Preliminaries
Chapter 8 Introduction to Number Theory. 2 Contents Prime Numbers Fermats and Eulers Theorems.
Finding The Unknown Number In A Number Sentence! NCSCOS 3 rd grade 5.04 By: Stephanie Irizarry Click arrow to go to next question.
Advanced Piloting Cruise Plot.
1 Vorlesung Informatik 2 Algorithmen und Datenstrukturen (Parallel Algorithms) Robin Pomplun.
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Chapter 1 The Study of Body Function Image PowerPoint
1 Copyright © 2010, Elsevier Inc. All rights Reserved Fig 2.1 Chapter 2.
By D. Fisher Geometric Transformations. Reflection, Rotation, or Translation 1.
Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.
1 Approximate string matching using factor automata J. Holub and B. Melichar Theoretical Computer Science vol.249 p Speaker: L. C. Chen Advisor:
Analysis of Algorithms
Business Transaction Management Software for Application Coordination 1 Business Processes and Coordination.
Tuesday, May 7 Integer Programming Formulations Handouts: Lecture Notes.
We need a common denominator to add these fractions.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Title Subtitle.
My Alphabet Book abcdefghijklm nopqrstuvwxyz.
Multiplying binomials You will have 20 seconds to answer each of the following multiplication problems. If you get hung up, go to the next problem when.
0 - 0.
DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
MULTIPLYING MONOMIALS TIMES POLYNOMIALS (DISTRIBUTIVE PROPERTY)
SUBTRACTING INTEGERS 1. CHANGE THE SUBTRACTION SIGN TO ADDITION
MULT. INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
FACTORING Think Distributive property backwards Work down, Show all steps ax + ay = a(x + y)
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
Addition Facts
Year 6 mental test 5 second questions
C1 Sequences and series. Write down the first 4 terms of the sequence u n+1 =u n +6, u 1 =6 6, 12, 18, 24.
ZMQS ZMQS
Richmond House, Liverpool (1) 26 th January 2004.
Randomized Algorithms Randomized Algorithms CS648 1.
Recurrences : 1 Chapter 3. Growth of function Chapter 4. Recurrences.
ABC Technology Project
© Charles van Marrewijk, An Introduction to Geographical Economics Brakman, Garretsen, and Van Marrewijk.
VOORBLAD.
Quadratic Inequalities
1 Breadth First Search s s Undiscovered Discovered Finished Queue: s Top of queue 2 1 Shortest path from s.
“Start-to-End” Simulations Imaging of Single Molecules at the European XFEL Igor Zagorodnov S2E Meeting DESY 10. February 2014.
How to convert a left linear grammar to a right linear grammar
BIOLOGY AUGUST 2013 OPENING ASSIGNMENTS. AUGUST 7, 2013  Question goes here!
Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)
Squares and Square Root WALK. Solve each problem REVIEW:
© 2012 National Heart Foundation of Australia. Slide 2.
Lets play bingo!!. Calculate: MEAN Calculate: MEDIAN
Understanding Generalist Practice, 5e, Kirst-Ashman/Hull
Chapter 5 Test Review Sections 5-1 through 5-4.
SIMOCODE-DP Software.
GG Consulting, LLC I-SUITE. Source: TEA SHARS Frequently asked questions 2.
Addition 1’s to 20.
25 seconds left…...
Test B, 100 Subtraction Facts
U1A L1 Examples FACTORING REVIEW EXAMPLES.
Januar MDMDFSSMDMDFSSS
Week 1.
We will resume in: 25 Minutes.
Less Than Matching Orgad Keller.
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
PSSA Preparation.
CpSc 3220 Designing a Database
1 Decidability continued…. 2 Theorem: For a recursively enumerable language it is undecidable to determine whether is finite Proof: We will reduce the.
Chapter 6 Languages: finite state machines
Traktor- og motorlære Kapitel 1 1 Kopiering forbudt.
Presentation transcript:

Sorting suffixes of two-pattern strings F. Franek & W.F. Smyth Algorithms Research Group Computing and Software McMaster University Hamilton, Ontario Canada PSC04, Praha, Czech Republic, August-September 2004 Slide 1

In 2003 several very different linear-time (recursive) algorithms to sort suffixes of strings appeared. All work in four basic steps: 1. Split all suffixes into two sets 2. Sort the first set of suffixes by recursion (recursive reduction of the problem) 3. Sort the second set of suffixes based on the order of the first set 4. Merge both sorted sets together Slide 2

Our question --- will two-pattern strings exhibit a natural tendency to reduce the problem in a recursive fashion? Two-pattern strings were introduced by us as a generalization of Sturmian (and hence Fibonacci) strings. Let p, q be binary strings. σ = [p,q,i,j] λ is an expansion of scope λ if |p|, |q| λ and i j non- negative integers. We require p and q to be dissimilar enough to be efficiently recognizable (see the paper for the details). Slide 3

Slide 4 σ(a)=p i q, σ(b)=p j q, σ(x[1..n])=σ(x[1])σ(x[2..n]) σ 1 σ 2 (x)=σ 1 ( σ 2 (x) ) x is two-pattern string of scope λ iff there is a sequence σ 1, σ 2,..., σ n of expansions of scope λ so that x= σ 1 σ 2 … σ n (a) The "nice" properties of two-pattern strings (see a series of papers by Franek, Smyth and others): can be recognized in linear time

Slide 5 when recognized, the canonical expansion sequence is computed repetitions and near repetitions can be effectively computed in linear time using recursive approach generalize finite fragments of the Fibonacci string and Sturmian strings can easily be generated and represented in recursive fashion exhibit rich yet comprehensible recursive structure

Slide 6 they occur relatively frequently among binary strings An illustration of a very simple two-pattern string; will be used later to illustrate the workings of the algorithm: [a,b,2,3] apply to a: a aab [ba,ab,1,2] apply to aab: aab baab baab babaab baabbaabbabaab is a two-pattern string of scope 2

Slide 7 Now we can rephrase our question: Given an expansion σ and knowing the order of suffixes of a two-pattern string x, can we efficiently determine the order of suffixes of σ(x)? The answer is yes and in the following we describe the algorithm. So let x be a two-pattern string of scope λ and let σ = [p,q,i,j] λ be an expansion and let y = σ(x). Let ρ 1 < ρ 2 < … < ρ |x| the sorted suffixes of x. We are assuming that q p (since then x 1 < x 2 iff σ(x 1 ) < σ(x 2 ), otherwise we work with

Slide 8 complements and reverse the resulting order of suffixes while taking complements of the suffixes). First we assign all suffixes of y into various buckets: ………. A δ,k = {δp k qσ(ρ) : ρ is a proper suffix of x or ρ=ε} δ is a suffix of p and 0 < k < i δ p k q σ(ρ)

Slide 9 ………. A δ,i = {δp i qσ(ρ) : ρ is a proper suffix of x or ρ=ε} δ is a suffix of p and also a suffix of q or A δ,i = {δp i qσ(ρ) : bρ proper suffix of x, ρ can be ε} δ is a suffix of p and not a suffix of q ……….

Slide 10 ………. A δ,k = {δp k qσ(ρ) : bρ proper suffix of x, ρ can be ε} δ is a suffix of p and i < k < j δ p k q σ(ρ) ………. δ q σ(ρ) B δ = {δqσ(ρ) : ρ proper nontrivial suffix of x} δ is a suffix of p and i < k < j

Slide 11 ………. δ p i q σ(ρ) C δ = {δp i qσ(ρ) : aρ proper suffix of x, ρ can be ε} δ is a suffix of q but not of p ………. δ p j q σ(ρ) D δ = {δp j qσ(ρ) : bρ proper suffix of x, ρ can be ε} δ is a suffix of q

Slide 12 E = {δ: δ is a nontrivial suffix of p or q} ………. δ δ All suffixes are covered by A-E ! Order of suffixes in buckets A-D determined by ρ ! A-D buckets are order invariant !

Slide 13 So, if we can determine the order of buckets, we can determine the order of all suffixes in buckets A- D. To merge in the suffixes from E is easy (brute force only requires 4λ 2 |y| steps). The main results is based on the fact that the order of buckets A-D can be efficiently determined using 5 cases: (C1) δ 1 δ 2 (C2) δ 2 δ 1 (C3) δ 1 is a proper prefix of δ 2 (C4) δ 2 is a proper prefix of δ 1 (C5) δ 1 =δ 2 =δ

Slide 14 (C1) A δ 1,k 1 n A δ 2,k 2 (C2) A δ 2,k 2 n A δ 1,k 1 (C3) δ 2 =δ 1 μ (a) if μ p, then A δ 2,k 2 n A δ 1,k 1 (b) otherwise A δ 1,k 1 n A δ 2,k 2 (C4) δ 1 =δ 2 μ (a) if μ p, then A δ 1,k 1 n A δ 2,k 2 (b) otherwise A δ 2,k 2 n A δ 1,k 1 (C5) (a) if k 1 k 2, then A δ,k 2 n A δ,k 1

Slide 15 (C1) A δ 1,k n B δ 2 (C2) B δ 2 n A δ 1,k (C3) δ 2 =δ 1 μ (a) if μ p, then B δ 2 n A δ 1,k (b) otherwise A δ 1,k n B δ 2 (C4) δ 1 =δ 2 μ (a) if μp k q pq, then A δ 1,k n B δ 2 (b) otherwise B δ 2 n A δ 1,k (C5) B δ n A δ,k No bucket comparison requires more than 3λ steps.

Slide 16 Similarly A~C, A~D, B~B, B~C, B~D, C~C, and C~D. One more example: (C1) B δ 1 n B δ 2 (C2) B δ 2 n B δ 1 (C3) δ 2 =δ 1 μ (a) if μqp qp, then B δ 2 n B δ 1 (b) otherwise B δ 1 n B δ 2 (C4) δ 1 =δ 2 μ (a) if μqp qp, then B δ 1 n B δ 2 (b) otherwise B δ 2 n B δ 1 (C5) B δ 1 = B δ 2

The High-level logic of the algorithm: 1. Create names (A,δ) for every suffix δ of p. (This requires at most λ steps. Each name will be eventually replaced by a sequence of buckets.) 2. Sort the names according to the comparisons of the four A buckets (according to (C1)-(C4)). (This requires at most 2λ 3 steps as we are sorting λ names and each comparison requires at most 2λ steps.) 3. Replace every name (A,δ) by a sequence of names (A,δ,k), 0< k < j. Let us call the resulting Slide 17

Slide 18 BUCKETS. (Now we have the names of A buckets in the proper order. This requires at most |y| steps as the size of BUCKETS is |y|.) 4. Create names (B,δ) for every suffix δ of p. (This requires at most λ steps.) 5. Merge into BUCKETS all names (B,δ) according to comparisons. (This requires at most |BUCKETS|3λ 2 steps, as we are merging in λ names and each comparison requires 3λ steps)

Slide Create names (C,δ) for every suffix δ of q that is not a suffix of p. (This requires at most λ 2 steps.) 7. Merge into BUCKETS all names (C,δ) according to comparisons. (This requires at most |BUCKETS|3λ 2 steps.) 8. Create names (D,δ) for every suffix δ of q. (This requires at most λ steps.) 9. Merge into BUCKETS all names(D,δ) according to comparisons. (Now we have all required bucket names, except E, in proper order.

Slide 20 This requires at most |BUCKETS|3λ 2 steps.) 10. Traverse BUCKETS and replace each name by a sequence of suffixes according to the sequence of suffixes of x. Let us call this sequence SUFFIXES. (Now we have all suffixes from buckets A-D in proper order. This requires at most |y| steps.) 11. Merge into SUFFIXES the suffixes from the bucket E. (This requires at most |y|4λ 2 steps.) Done in less than (2λ 3 +14λ 2 +3λ+2)|y| steps!

Slide 21 The algorithm works in 2(2λ 3 +14λ 2 +3λ+2)n steps, where n is the size of the input string. An example: x = aab$ y = baabbaabb a b a a b $ σ=[ba,ab,1,2] ordered suffixes of x: ordered suffixes of y:

Slide 22 A ba,1 = {babaabσ(ρ) : bρ proper suffix of x, ρ can be ε}= {babaab}={9} A a,1 = {abaabσ(ρ) : bρ proper suffix of x, ρ can be ε}= {abaab}={10} B ba = {baabσ(ρ) : ρ proper suffix of x}={baabσ(ab), baabσ(b)}={baabbaabbabaab, baabbabaab}={1,5} B a = {aabσ(ρ) : ρ proper suffix of x}={aabσ(ab), aabσ(b)}={aabbaabbabaab, aabbabaab}={2,6} C ab = {abbaabσ(ρ) : aρ proper suffix of x}= {abbaabσ(b)}={abbaabbabaab}={3} C b = {bbaabσ(ρ) : bρ proper suffix of x}= {bbaabσ(b)}={bbaabbabaab}={4}

Slide 23 D ab = {abbabaabσ(ρ) : bρ proper suffix of x, ρ can be ε}= {abbabaab}={7} D b = {bbabaabσ(ρ) : bρ proper suffix of x, ρ can be ε}= {bbabaab}={8} E = {baab, aab, ab, b}={11, 12, 13, 14} A ba,1 o A a,1 (by C2) A ba,1 o B ba (by C5) A ba,1 o B a (by C2) A ba,1 o C ab (by C2) A ba,1 n C b (by C4a)

Slide 24 A ba,1 o D ab (by C2) A ba,1 n D b (by C4a) A a,1 n B ba (by C1) A a,1 o B a (by C5) A a,1 n C ab (by C3b) A a,1 n C b (by C1) A a,1 n D ab (by C3b) A a,1 n D b (by C1)

Slide 25 B ba o B a (by C2) B ba o C ab (by C2) B ba n C b (by C4a) B ba o D ab (by C2) B ba n D b (by C4a) B a n C ab (by C3b) B a n C b (by C1) B a n D ab (by C3b) B a n D b (by C1)

Slide 26 C ab n C b (by C1) C ab n D ab (by C5) C ab n D b (by C1) C b o D ab (by C1) C b n D b (by C5) D ab n D b (by C1) B a n A a,1 n C ab n D ab n B ba n A ba,1 n C b n D b

Slide 27