Presentation is loading. Please wait.

Presentation is loading. Please wait.

A new matching algorithm based on prime numbers N. D. Atreas and C. Karanikas Department of Informatics Aristotle University of Thessaloniki.

Similar presentations


Presentation on theme: "A new matching algorithm based on prime numbers N. D. Atreas and C. Karanikas Department of Informatics Aristotle University of Thessaloniki."— Presentation transcript:

1

2 A new matching algorithm based on prime numbers N. D. Atreas and C. Karanikas Department of Informatics Aristotle University of Thessaloniki

3 Exact Matching: find all the occurences of a pattern within a text. 1. The Brute Force algorithm: performs character by character comparison in O(N M) time complexity, where M is the length of the pattern and N is the length of the text. 2. The Knuth-Morris-Pratt algorithm: Runs in O(N+M) time, avoiding unecessary re-examinations of previously matched characters.

4 3. The Boyer-Moore algorithm: involves character by character comparison by using backwards checking. Best case execution: O(N/M), worst time: O(N). involves character by character comparison by using backwards checking. Best case execution: O(N/M), worst time: O(N). 4. The Karp Rabin algorithm: It is a randomised algorithm that seeks a pattern within a text by using hashing. Expected running time O(N+M). It is a randomised algorithm that seeks a pattern within a text by using hashing. Expected running time O(N+M).

5 A hash function must be: A hash function must be: –efficiently computable; –highly discriminating for strings; –hash(x(j+1... j+M)) must be easily computable from hash(x(j … j+M-1)) and x(j+M). –not injective, i.e. the equality of two hash values suggests, but does not guarantee, equality of the inputs.

6 Let x = {x(1),…x(N)} be a set of positive integers and p(1) Max{x(i):, i=1,..,N}, we define the transform:

7 Properties of T(x(1)…x(N)) T(x(1),…x(N)) is one to one. x(1),…,x(N) can be recovered from T(x) as the unique solution of a system of N linear Diophantine equations defined recursively: (p(i+1)…p(N))x(i)+p(i)c(i+1) = c(i) (p(i+1)…p(N))x(i)+p(i)c(i+1) = c(i) where c(1)=T(x)p(1)…P(N). where c(1)=T(x)p(1)…P(N).

8 Properties of T(x(1)…x(N)) T(x) can be used as a measure of similarity between two strings, since it can be used for counting the different elements between them. It provides a necessary and sufficient condition to detect whenever a binding operation on strings can be implemented. It is not a hash function.

9 Modelling a hash function approximating T.

10 Definition of the hash function We prove:

11 Final form of hash function Theorem

12 Software implementation Let X={x(1),…,x(N)} be the text and Y={y(1),…,y(M)} be the pattern. Compute T(y(1),…,y(M)) and T(x(1),…,x(M)) in O(M) time. Compute the hash values in O(N-M) time:

13 Software implementation for some i then x(i+1),…,x(i+M-1) is a candidate for string matching. For all candidates perform at most p (p is the length of the alphabet) character comparisons to throw out false matches. The algorithm executes in O(N) time complexity.

14 Conclusions We introduce the idea of a hash function approximation in order to reduce the computational complexity of an algorithm. Although the time bounds are the same or in some times inferiors compared to Boyer-Moore algorithm, our algorithm is superior for multiple matching problems.


Download ppt "A new matching algorithm based on prime numbers N. D. Atreas and C. Karanikas Department of Informatics Aristotle University of Thessaloniki."

Similar presentations


Ads by Google