# 1 Approximate string matching using factor automata J. Holub and B. Melichar Theoretical Computer Science vol.249 p.305-311 Speaker: L. C. Chen Advisor:

## Presentation on theme: "1 Approximate string matching using factor automata J. Holub and B. Melichar Theoretical Computer Science vol.249 p.305-311 Speaker: L. C. Chen Advisor:"— Presentation transcript:

1 Approximate string matching using factor automata J. Holub and B. Melichar Theoretical Computer Science vol.249 p.305-311 Speaker: L. C. Chen Advisor: R. C. T. Lee

2 Problem D L (P, X) between strings P and X is the minimum number of edit operations (substitution, insertion and deletion) needed to convert string P to X. Given a text T, a pattern P, and an integer k, k m n, approximate string matching can be defined as determining whether string X occurs in text T such that edit distance D L (P, X) between pattern P and string X is less than or equal to k.

3 An example of Edit Distance To convert P into T: P = abcde T = bcfeg P = abcde T = bcfeg P 1 = bcde P 2 = bcfe f g Delete a Substitute d with f Insert

4 Basic definition Fac(T): a set contains all the substrings of text T. A nondeterministic finite automaton (NFA) is a five- tuple M=(Q, Σ, δ, q 0, F), where Q is a finite set of states, Σ is a finite input alphabet, δ is a mapping from Q×(Σ {ε}) into the set of subsets of Q, q 0 Q is an initial state, and F Q is a set of final states. M(Fac(T)): a factor automaton accepts Fac(T).

5 T=aabbabd Fac(T)={a,b,d,aa,ab,bb,ba,bd,aab,abb,bba,bab,abd,aabb,abba,bbab,babd aabba,abbab,bbabd,aabbab,abbabd,aabbabd} Factor automaton Factor automation M(Fac(T)): a deterministic finite automaton (DFA) accepts all substrings of the given text T.

6 A suffix tree can also be used to recognize all substrings of T=aabbabd, Fac(T)={a,b,d,aa,ab,bb,ba,bd,aab,abb,bba,bab,abd,aabb,abba,bbab,babd aabba,abbab,bbabd,aabbab,abbabd,aabbabd}

7 P = bab, k=1. The finite automaton M(L k (P)) accepts L k (P). Lk(P)={ab, bb, ba, aab, bab, dab, bbb, bdb baa, bad, bbab, bdab, baab, badb}. One matched, 0 error. One matched, one error. Three matched, 0 error.

8 P = bab, k=1. The finite automaton M(L k (P)) accepts L k (P). Lk(P)={ab, bb, ba, aab, bab, dab, bbb, bdb baa, bad, bbab, bdab, baab, badb}. Recognize ab

9 P = bab, k=1. The finite automaton M(L k (P)) accepts L k (P). Lk(P)={ab, bb, ba, aab, bab, dab, bbb, bdb baa, bad, bbab, bdab, baab, badb}. Recognize aab

10 P = bab, k=1. The finite automaton M(L k (P)) accepts L k (P). Lk(P)={ab, bb, ba, aab, bab, dab, bbb, bdb baa, bad, bbab, bdab, baab, badb}. Recognize bbab

11 Definition Let An automaton for intersection of M 1 and M 2 is an automaton

12 T=aabbabd P = bab, k=1 Intersection of M(Lk(P)) and M(Fac(T)). Solutions : {ba, bab, bb, bbab, aab, ab} (All end with {3,0} or {3,1}.)

13 T=aabbabd P = bab, k=1 Intersection of M(Lk(P)) and M(Fac(T)).

14 Intersection aabbabd T D L (P,ba)=1 P=bab

15 Intersection aabbabd T D L (P,bab)=0 P=bab

16 Intersection aabbabd T PP=bab D L (P,bb)=1

17 Intersection aabbabd T P=bab D L (P,bbab)=1

18 Intersection aabbabd T P=bab D L (P,aab)=1

19 Intersection aabbabd T P=bab D L (P,ab)=1

20 Lemma The number of automaton is always lower than.

21 T=aabbabd P = bab, k=1. The finite automaton M(L k (P)) accepts L k (P). Lk(P)={ab, bb, ba, aab, bab, dab, bbb, bdb baa, bad, bbab, bdab, baab, badb}.

22 Thank you!

Download ppt "1 Approximate string matching using factor automata J. Holub and B. Melichar Theoretical Computer Science vol.249 p.305-311 Speaker: L. C. Chen Advisor:"

Similar presentations