# 1 Average Case Analysis of an Exact String Matching Algorithm Advisor: Professor R. C. T. Lee Speaker: S. C. Chen.

## Presentation on theme: "1 Average Case Analysis of an Exact String Matching Algorithm Advisor: Professor R. C. T. Lee Speaker: S. C. Chen."— Presentation transcript:

1 Average Case Analysis of an Exact String Matching Algorithm Advisor: Professor R. C. T. Lee Speaker: S. C. Chen

2 Problem Definition We are given text T=t 1 t 2 …t n with length n and a pattern P=p 1 p 2 …p m with length m and we are asked to find all occurrences of P in T. Example: There are two occurrences of P in T as shown below:

3 There are many rules in exact string matching algorithms. For example, the Suffix to Prefix Rule, the Substring Matching Rule, ….

4 We use the idea, the substring matching rule, in this algorithm.

5 The Substring Matching Rule For any substring S in T, find a nearest S in P which is to the left of it. If such an S in P exists, move P such then the two Ss match; otherwise, we may define a new partial window.

6

7 In this algorithm, we first check whether S=T[i- r+1…i] is a substring of P or not. If S does not occur in P, we shift P to right m-r steps.

8 If S occurs in P, according to the Substring Matching Rule, we should slide P so that the two substrings S match as shown below.

9 But, our algorithm is not that smart, instead of sliding P so that the two substrings S match, we simply examine the entire window starting from i-m+1 to 2i-r to see whether P occurs in this window, as shown below.

10 Note that our not so smart algorithm covers the case of sliding P to match the two substrings S.

11 Algorithm Algorithm fast-on-average; i=m; while i n do begain if T[i-r+1…i] is a substring of P then compute all occurrences of P whose starting positions are in T[i-m+1…i-r+1] applying KMP algorithm. else { P does not start in T[i-m+1…i-r+1] } i=i+m-r end

12 Analysis First of all, let us note that in the above algorithm, we have to determine whether the suffix S occurs in P or not. This is again an exact string matching problem. Let us assume that there is a pre-processing to construct a suffix tree of P. Whether S occurs in P or not can be determined by feeding S into the suffix tree of P. Because the length of S is r, we can determine whether S occurs in P in O(r).

13 For reasons which will become clear later, we assume that

14 We assume that the text is a random string and the size of alphabet is α.

15 There are α r possible substrings with length r consisting of α distinct characters. There are only m-r substrings with length r in P whose length is m. Thus, the probability that S is a substring of P is not great than

16 If S is a substring of P, we find all occurrences of P in T[i-m…2i-r] using KMP algorithm.

17 Because the length of T[i-m…2i-r] is 2m-r, time complexity of Step i using KMP algorithm is O(m)

18 (1)The probability that S occurs in P is. (2)When S occurs in P, the time complexity that we use KMP algorithm to find all occurrences of P in T[i-m+1…2i-r] is O(m). Summary of (2) and (3), the average time-complexity of applying the KMP algorithm is In the above, the time complexity of checking whether S occurs in P is O(r). Thus, the average time-complexity of applying the KMP algorithm once is O(r).

19 Thus, if S does not occurs in P, the time complexity of Step i is only the checking time- complexity which is O(r). If does, the time complexity of Step i is O(r).

20 Because there are windows with length m in T, the time complexity of this algorithm on average is.

21 Reference [KMP77] Faster Pattern Matching in Strings, SIAM Journal on Computing 6 (2),1977, pp. 323–350. [CR2002] Section 2.2:Boyer-Moore algorithm and its variations, Jewels of Stringology, 2002, pp. 30-31.

22 Thank you

Similar presentations