Presentation is loading. Please wait.

Presentation is loading. Please wait.

JASS04 - Sequential Pattern MatchingTobias Reichl1 Joint Advanced Student School 2004 Complexity Analysis of String Algorithms Sequential Pattern Matching:

Similar presentations


Presentation on theme: "JASS04 - Sequential Pattern MatchingTobias Reichl1 Joint Advanced Student School 2004 Complexity Analysis of String Algorithms Sequential Pattern Matching:"— Presentation transcript:

1 JASS04 - Sequential Pattern MatchingTobias Reichl1 Joint Advanced Student School 2004 Complexity Analysis of String Algorithms Sequential Pattern Matching: Analysis of Knuth-Morris-Pratt type algorithms using the Subadditive Ergodic Theorem 22 August 2015

2 JASS04 - Sequential Pattern MatchingTobias Reichl2 Overview 1.Pattern Matching Sequential Algorithms Knuth-Morris-Pratt-Algorithm 2.Probabilistic tools Subadditive Ergodic Theorem Martingales and Azuma's Inequality 3.Analysis of KMP-Algorithms Properties of KMP Establishing subadditivity Analysis

3 JASS04 - Sequential Pattern MatchingTobias Reichl3 Pattern Matching Text, pattern Comparison: Alignment Position: for some k. abcde xxxxxabxxxabcxxxabcde Pattern p Text t Pattern-text comparison: M(l,k)=1 Alignment position AP

4 JASS04 - Sequential Pattern MatchingTobias Reichl4 Sequential Algorithms - Definition i.Semi-sequential: AP are non-decreasing. ii.Strongly semi-sequential: (i) and comparisons define non-decreasing text positions. iii.Sequential: (i) and iv.Strongly sequential: (i), (ii) and (iii) abcde xxxxxabxxxabcxxxabcde Text is compared only if following a prefix of the pattern. Example:

5 JASS04 - Sequential Pattern MatchingTobias Reichl5 Example: Naive / brute force algorithm Every text position is alignment position. Text is scanned until... –pattern is found - then done. –mismatch occurs - then shift by one and retry. Sequential algorithm. abcde xxxxxabxxxabcxxxabcde abcde +1

6 JASS04 - Sequential Pattern MatchingTobias Reichl6 Idea: (Morris-Pratt) Disreagard APs already known not to be followed by a prefix of p. Knowledge: –Already processed pattern –Pre-processing of p. Strongly sequential algorithm. Knuth-Morris-Pratt type algorithms (1) xxxxxabxxxabcxxxabcde ababcde +S

7 JASS04 - Sequential Pattern MatchingTobias Reichl7 Knuth-Morris-Pratt type algorithms (2) Morris-Pratt: Knuth-Morris-Pratt: xxxxxabxxxabcxxxabcde ababcde xxxxxabxxxabcxxxabcde ababcde (KMP also skips mismatching letters)

8 JASS04 - Sequential Pattern MatchingTobias Reichl8 Overall complexity: Pattern or text is a realization of random sequence: Question: complexity of KMP? Pattern Matching - Complexity

9 JASS04 - Sequential Pattern MatchingTobias Reichl9 Fekete (1923) Subadditivity: Superadditivity: Subadditivity – Deterministic Sequence

10 JASS04 - Sequential Pattern MatchingTobias Reichl10 Example: Longest Common Subsequence Superadditive: Hence: abcdeabcdfabcab ababcafbcdabcde abcdeabc ababcafb dfabcab cdabcde LCS: "abcabcdabc" (10)LCS: "abcab" (5), "dabc" (4) (Conjectured by Steele in 1982)

11 JASS04 - Sequential Pattern MatchingTobias Reichl11 Subadditivity – "Almost subadditive" DeBruijn and Erdös (1952) positive and non-decreasing sequence "Almost subadditive":

12 JASS04 - Sequential Pattern MatchingTobias Reichl12 Subadditive Ergodic Theorem Kingman (1976), Liggett (1985) i. ii. is a stationary sequence iii. does not depend on m iv.

13 JASS04 - Sequential Pattern MatchingTobias Reichl13 Almost Subadditive Ergodic Theorem Deriennic (1983) Subadditivity can be relaxed to with Then, too:

14 JASS04 - Sequential Pattern MatchingTobias Reichl14 Martingales A sequence is a martingale with respect to the filtration if for all :  defines a random variable depending on the knowledge contained in.

15 JASS04 - Sequential Pattern MatchingTobias Reichl15 Martingale Differences The martingale difference is defined as so that: Observe:

16 JASS04 - Sequential Pattern MatchingTobias Reichl16 Azuma's Inequality (1) Let be a martingale Define the martingale difference as (The mean of the same element but depending on different knowledge) Observe: (Deviation from the mean)

17 JASS04 - Sequential Pattern MatchingTobias Reichl17 Hoeffding's Inequality Let be a martingale Let there exist constant Then:

18 JASS04 - Sequential Pattern MatchingTobias Reichl18 Azuma's Inequality (2) Summary: –If is bounded, we know how to assess the deviation from the mean. –So now we need a bound on. Trick: Let be an independent copy of. Then:

19 JASS04 - Sequential Pattern MatchingTobias Reichl19 Azuma's Inequality (3) Hence: And we can postulate:

20 JASS04 - Sequential Pattern MatchingTobias Reichl20 Azuma's Inequality (4) Let be a martingale If there exists constant such that where is an independent copy of Then:

21 JASS04 - Sequential Pattern MatchingTobias Reichl21 KMP: Unavoidable alignment positions A position in the text is called unavoidable AP if for any r,l it's an AP when run on. KMP-like algorithms have the same set of unavoidable alignment positions where Example: abcde xxxxxabxxxabcxxxabcde

22 JASS04 - Sequential Pattern MatchingTobias Reichl22 Pattern Matching: l-convergence An algorithm is l-convergent if there exists an increasing sequence of unavoidable alignment positions satisfying l-convergence indicates the maximum size "jumps" for an algorithm.

23 JASS04 - Sequential Pattern MatchingTobias Reichl23 KMP: Establishing m-convergence Let AP be an alignment position Define: Hence: and so KMP-like algorithms are m-convergent.

24 JASS04 - Sequential Pattern MatchingTobias Reichl24 KMP: Establishing subadditivity (1) If (number of comparisons) is subadditive we can prove linear complexity of KMP-like algorithms. We have to show: is (almost) subadditive: Approach: An l-convergent sequential algorithm satisfies:

25 JASS04 - Sequential Pattern MatchingTobias Reichl25 KMP: Establishing subadditivity (2) Proof: – : the smallest unavoidable AP greater than r. –We split into and.

26 JASS04 - Sequential Pattern MatchingTobias Reichl26 KMP: Establishing subadditivity (3) Comparisons done after r with AP before r: Comparisons with AP between r and : No more than m comparisons can be saved at S1 S2 Contributing to and Contributing to only Contributing to and ? ? ? ? ? ?

27 JASS04 - Sequential Pattern MatchingTobias Reichl27 Comparisons with AP between r and : No more than m comparisons can be saved at KMP: Establishing subadditivity (4)S3 Contributing to only Contributing to and ? ? ? ?

28 JASS04 - Sequential Pattern MatchingTobias Reichl28 KMP: Establishing subadditivity (5) So we are able to bound: We have shown: is (almost) subadditive: Now we are able to apply the Subadditive Ergodic Theorem.

29 JASS04 - Sequential Pattern MatchingTobias Reichl29 KMP: Different Modeling Assumptions Deterministic Model: Text and pattern are non random. Semi-Random Model: Text is a realization of a stationary and ergodic sequence, pattern is given. Stationary model: Both text and pattern are realizations of a stationary and ergodic sequence.

30 JASS04 - Sequential Pattern MatchingTobias Reichl30 KMP: Applying the Subadditive Ergodic Theorem We have shown: is (almost) subadditive Deterministic Model: Semi-Random Model: Stationary Model:

31 JASS04 - Sequential Pattern MatchingTobias Reichl31 KMP: Applying Azuma's Inequality satisfies: where is an independent copy of. So, using Azuma's Inequality: is concentrated around its mean:

32 JASS04 - Sequential Pattern MatchingTobias Reichl32 Conclusion Using the Subadditive Ergodic Theorem we can show there exists a linearity constant for the worst and average case resp. KMP has linear complexity. The Subadditive Ergodic Theorem proves the existence of this constant but says nothing how to compute it. Using Azuma's Inequality we can show that the number of comparisons is well concentrated around its mean.


Download ppt "JASS04 - Sequential Pattern MatchingTobias Reichl1 Joint Advanced Student School 2004 Complexity Analysis of String Algorithms Sequential Pattern Matching:"

Similar presentations


Ads by Google