Download presentation

Presentation is loading. Please wait.

Published byMegan McNamara Modified over 4 years ago

1
1 Average Case Analysis of an Exact String Matching Algorithm Advisor: Professor R. C. T. Lee Speaker: S. C. Chen

2
2 Problem Definition We are given text T=t 1 t 2 …t n with length n and a pattern P=p 1 p 2 …p m with length m and we are asked to find all occurrences of P in T. Example: There are two occurrences of P in T as shown below:

3
3 There are many rules in exact string matching algorithms. For example, the Suffix to Prefix Rule, the Substring Matching Rule, ….

4
4 We use the idea, the substring matching rule, in this algorithm.

5
5 The Substring Matching Rule For any substring S in T, find a nearest S in P which is to the left of it. If such an S in P exists, move P such then the two Ss match; otherwise, we may define a new partial window.

6
6

7
7 In this algorithm, we first check whether S=T[i- r+1…i] is a substring of P or not. If S does not occur in P, we shift P to right m-r steps.

8
8 If S occurs in P, according to the Substring Matching Rule, we should slide P so that the two substrings S match as shown below.

9
9 But, our algorithm is not that smart, instead of sliding P so that the two substrings S match, we simply examine the entire window starting from i-m+1 to 2i-r to see whether P occurs in this window, as shown below.

10
10 Note that our not so smart algorithm covers the case of sliding P to match the two substrings S.

11
11 Algorithm Algorithm fast-on-average; i=m; while i n do begain if T[i-r+1…i] is a substring of P then compute all occurrences of P whose starting positions are in T[i-m+1…i-r+1] applying KMP algorithm. else { P does not start in T[i-m+1…i-r+1] } i=i+m-r end

12
12 Analysis First of all, let us note that in the above algorithm, we have to determine whether the suffix S occurs in P or not. This is again an exact string matching problem. Let us assume that there is a pre-processing to construct a suffix tree of P. Whether S occurs in P or not can be determined by feeding S into the suffix tree of P. Because the length of S is r, we can determine whether S occurs in P in O(r).

13
13 For reasons which will become clear later, we assume that

14
14 We assume that the text is a random string and the size of alphabet is α.

15
15 There are α r possible substrings with length r consisting of α distinct characters. There are only m-r substrings with length r in P whose length is m. Thus, the probability that S is a substring of P is not great than

16
16 If S is a substring of P, we find all occurrences of P in T[i-m…2i-r] using KMP algorithm.

17
17 Because the length of T[i-m…2i-r] is 2m-r, time complexity of Step i using KMP algorithm is O(m)

18
18 (1)The probability that S occurs in P is. (2)When S occurs in P, the time complexity that we use KMP algorithm to find all occurrences of P in T[i-m+1…2i-r] is O(m). Summary of (2) and (3), the average time-complexity of applying the KMP algorithm is In the above, the time complexity of checking whether S occurs in P is O(r). Thus, the average time-complexity of applying the KMP algorithm once is O(r).

19
19 Thus, if S does not occurs in P, the time complexity of Step i is only the checking time- complexity which is O(r). If does, the time complexity of Step i is O(r).

20
20 Because there are windows with length m in T, the time complexity of this algorithm on average is.

21
21 Reference [KMP77] Faster Pattern Matching in Strings, SIAM Journal on Computing 6 (2),1977, pp. 323–350. [CR2002] Section 2.2:Boyer-Moore algorithm and its variations, Jewels of Stringology, 2002, pp. 30-31.

22
22 Thank you

Similar presentations

Presentation is loading. Please wait....

OK

MCS 101: Algorithms Instructor Neelima Gupta

MCS 101: Algorithms Instructor Neelima Gupta

© 2018 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google