# Advisor: Prof. R. C. T. Lee Speaker: C. W. Lu

## Presentation on theme: "Advisor: Prof. R. C. T. Lee Speaker: C. W. Lu"— Presentation transcript:

Advisor: Prof. R. C. T. Lee Speaker: C. W. Lu
Simon Algorithm String matching algorithms and automata SIMON I. 1st American Workshop on String Processing, pp (1993) Advisor: Prof. R. C. T. Lee Speaker: C. W. Lu

String matching problem
Given a text string T of length n and a pattern string P of length m. Find all occurrences of P in T. Simon algorithm is an algorithm which solves the string matching problem. Skip the pattern by using Rule 2.

In KMP algorithm, they use a Prefix Function to determine the window shifting.
Case 1: T: P: Find Move Case 2: z ≠ x t y t x u z x u z x No such u where z ≠ x. t y P

Simon algorithm improves KMP algorithm by using a better function.
Case 1: T: P: Find Move Case 2: z = y t y t x u z x u z x No such u where z = y. t y P

Example: T : P : ∵ The string P(5,8)= ATCA is the longest suffix of ATCACATCA which is equal to a prefix of P, namely P(0,3), and P(4) = T(9), that is T(5,9)= P(0,4). Therefore, we can slide the window to align P(4) with T(9). ATCACATCACCAGTCATACCA ATCACATCATCA ATCACATCACCAGTCATACCA ATCACATCATCA

Example: P : ATCACATCAACAGTCATACCACAC T : ATCACATCATCA
∵ There is no suffix of ATCACATCAA which is equal to a prefix of P. Therefore, we slide the window to align P(0) with T(9). In this case, Simon algorithm is batter than KMP algorithm because KMP algorithm would align P(0) with T(5). ATCACATCAACAGTCATACCACAC ATCACATCATCA ATCACATCAACAGTCATACCACAC ATCACATCATCA

Simon Table Let Let u be longest suffix of (P(0, i-1) + y) which is equal to a prefix of P, where y≠ P(i). SimonTable (i, y, |u|) T t y u P t x u

Note that, in the Simon Algorithm, when a mismatch occurs at location i, and if
(i, y, |u|) SimonTable, we could move P by (i-|u|+1) steps, otherwise, move P by i+1 steps.

The Simon Table can be constructed recursively by using the table which is used in MP algorithm, called Prefix Table.

The Simon Table for ( i=1 ; i<=m ; i++ ) { t = prefix (i-1)
while ( t > 0 ) { if ( P(i) ≠ P(t) & (i, P(t), *) SimonTable) SimonTable (i, P(t), t+1); end if else t = prefix ( t – 1 ); /*recursive*/ end while if( t = 0 & P(i) ≠ P(t) & (i, P(t), *) SimonTable) SimonTable (i, P(t), 1); end for

Example i = 1. t = prefix(1-1) = prefix(0) = 0.
1 2 3 4 5 6 7 P B C B A B C B A Prefix 1 1 2 3 4 i = 1. t = prefix(1-1) = prefix(0) = 0. P(1) ≠ P(t)= P(0) & (1, P(t), *) SimonTable SimonTable (1, B, 1). SimonTable : {(1, B, 1)}

Example i = 2. t = prefix(2-1) = prefix(1) = 0. P(2) = P(t) = P(0) = B
1 2 3 4 5 6 7 P B C B A B C B A Prefix 1 1 2 3 4 i = 2. t = prefix(2-1) = prefix(1) = 0. P(2) = P(t) = P(0) = B SimonTable : {(1, B, 1)}

t = prefix(3-1) = prefix(2) = 1.
1 2 3 4 5 6 7 P B C B A B C B A Prefix 1 1 2 3 4 i = 3. t = prefix(3-1) = prefix(2) = 1. P(3) ≠ P(t)= P(1) & (3, C, *) SimonTable SimonTable (3, C, t+1) = (3, C, 2). t = prefix(t-1) = prefix(0) = 0 P(3) ≠ P(0) & (3, B, *) SimonTable SimonTable (3, B, 1). SimonTable: {(1, B, 1), (3, C, 2), (3, B, 1)}

t = prefix(4-1) = prefix(3) = 0. P(4) = P(t) = P(0) = B
1 2 3 4 5 6 7 P B C B A B C B A Prefix 1 1 2 3 4 i = 4. t = prefix(4-1) = prefix(3) = 0. P(4) = P(t) = P(0) = B SimonTable : {(1, B, 1), (3, C, 2), (3, B, 1)}

t = prefix(5-1) = prefix(4) = 1. P(5) = P(t)= P(1) = C
1 2 3 4 5 6 7 P B C B A B C B A Prefix 1 1 2 3 4 i = 5. t = prefix(5-1) = prefix(4) = 1. P(5) = P(t)= P(1) = C t = prefix(t-1) = prefix(0) = 0 P(5) ≠ P(0) & (5, B, *) SimonTable SimonTable (5, B, 1). SimonTable: {(1, B, 1), (3, C, 2), (3, B, 1), (5, B, 1)}

t = prefix(6-1) = prefix(5) = 2. P(6) = P(t)= P(2) = B.
1 2 3 4 5 6 7 P B C B A B C B A Prefix 1 1 2 3 4 i = 6. t = prefix(6-1) = prefix(5) = 2. P(6) = P(t)= P(2) = B. t = prefix(t-1) = prefix(1) = 0. P(6) = P(0) = B. SimonTable: {(1, B, 1), (3, C, 2), (3, B, 1), (5, B, 1)}

t = prefix(7-1) = prefix(6) = 3. P(7) = P(t)= P(3) = A.
1 2 3 4 5 6 7 i = 7. t = prefix(7-1) = prefix(6) = 3. P(7) = P(t)= P(3) = A. t = prefix(t-1) = prefix(2) = 1. P(7) ≠ P(t) = P(1) & (7, C, *) SimonTable SimonTable (7, C, t+1) = (7, C, 2). t = prefix(t-1) = prefix(0) = 0. P(7) ≠ P(0) & (7, B, *) SimonTable SimonTable (7, B, 1). SimonTable: {(1, B, 1), (3, C, 2), (3, B, 1), (5, B, 1), (7, C, 2), (7, B, 1)} P B C B A B C B A Prefix 1 1 2 3 4

t = prefix(8-1) = prefix(7) = 4.
1 2 3 4 5 6 7 8 P B C B A B C B A Prefix 1 1 2 3 4 i = 8. t = prefix(8-1) = prefix(7) = 4. P(8) ≠ P(t) = P(4) & (8, B, *) SimonTable. SimonTable (8, B, t+1) = (8, B, 5). t = prefix(t-1) = prefix(3) = 0. P(8) ≠ P(0), but (8, B, *) SimonTable SimonTable: {(1, B, 1), (3, C, 2), (3, B, 1), (5, B, 1), (7, C, 2), (7, B, 1), (8, B, 5)}

∵ P(3)≠T(3) = C, and (3, C, 2) SimonTable.
SimonTable: {(1, B, 1), (3, C, 2), (3, B, 1), (5, B, 1), (7, C, 2), (7, B, 1), (8, B, 5)} Example: T : P : ∵ P(3)≠T(3) = C, and (3, C, 2) SimonTable. ∴ Move P by (3-2+1) = 2 steps B C B C B D B C B A B C B A B B C B A B D B A B C B A B C B A

∵ P(3)≠T(5) = D, and (3, D, *) SimonTable. ∴ Move P by 4 steps
SimonTable: {(1, B, 1), (3, C, 2), (3, B, 1), (5, B, 1), (7, C, 2), (7, B, 1), (8, B, 5)} Example: T : P : ∵ P(3)≠T(5) = D, and (3, D, *) SimonTable. ∴ Move P by 4 steps B C B C B D B C B A B C B A B B C B A B C B A B C B A B C B A

∵ P(0, 7)=T(6, 13), and (8, B, 5) SimonTable.
SimonTable: {(1, B, 1), (3, C, 2), (3, B, 1), (5, B, 1), (7, C, 2), (7, B, 1), (8, B, 5)} Example: T : P : ∵ P(0, 7)=T(6, 13), and (8, B, 5) SimonTable. ∴ Move P by (8-5+1) = 4 steps. B C B C B D B C B A B C B A B B C B A B C B A B C B A B C B A

Preprocessing phase in O(m) time and space complexity.
Searching phase in O(m+n) time complexity.

References BEAUQUIER, D., BERSTEL, J., CHRÉTIENNE, P., 1992, Éléments d'algorithmique, Chapter 10, pp , Masson, Paris. CROCHEMORE, M., Off-line serial exact string searching, in Pattern Matching Algorithms, ed. A. Apostolico and Z. Galil, Chapter 1, pp 1-53, Oxford University Press. CROCHEMORE, M., HANCART, C., Automata for Matching Patterns, in Handbook of Formal Languages, Volume 2, Linear Modeling: Background and Application, G. Rozenberg and A. Salomaa ed., Chapter 9, pp , Springer-Verlag, Berlin. CROCHEMORE, M., RYTTER, W., 1994, Text Algorithms, Oxford University Press. HANCART, C., 1992, Une analyse en moyenne de l'algorithme de Morris et Pratt et de ses raffinements, in Théorie des Automates et Applications, Actes des 2e Journées Franco-Belges, D. Krob ed., Rouen, France, 1991, PUR 176, Rouen, France, HANCART, C., 1993, On Simon's string searching algorithm, Inf. Process. Lett. 47(2):95-99. HANCART, C., Analyse exacte et en moyenne d'algorithmes de recherche d'un motif dans un texte, Ph. D. Thesis, University Paris 7, France. SIMON I., 1993, String matching algorithms and automata, in in Proceedings of 1st American Workshop on String Processing, R.A. Baeza-Yates and N. Ziviani ed., pp , Universidade Federal de Minas Gerais, Brazil. SIMON, I., 1994, String matching algorithms and automata, in Results and Trends in Theoretical Computer Science, Graz, Austria, Karhumäki, Maurer and Rozenberg ed., pp , Lecture Notes in Computer Science 814, Springer Verlag.

Thank you!