Presentation is loading. Please wait.

Presentation is loading. Please wait.

LING 438/538 Computational Linguistics Sandiway Fong Lecture 11: 10/3.

Similar presentations


Presentation on theme: "LING 438/538 Computational Linguistics Sandiway Fong Lecture 11: 10/3."— Presentation transcript:

1 LING 438/538 Computational Linguistics Sandiway Fong Lecture 11: 10/3

2 2 Administrivia homework 2 –will be returned tomorrow (by email) homework 3 –will be out on Thursday

3 3 Last Tuesday textbook –Chapter 2: Regular Expressions and Finite State Automata regular expressions –Unix grep and –wildcard search in Microsoft Word implementing the FSA in Prolog –Method 1: two line program fsa/2 + transition/3 (δ function) and final_state/1 –Method 2: define each state, e.g. x, as a predicate, e.g. x/1, taking the input list as an argument –non-determinism handled by Prolog’s computation rule

4 4 Today’s Topic more on FSA –expressive power –limits

5 5 Determinism deterministic FSA (DFSA) –no ambiguity about where to go at any given state non-deterministic FSA (NDFSA) –no restriction on ambiguity (surprisingly, no increase in formal power) textbook –D-RECOGNIZE (FIGURE 2.13) –ND-RECOGNIZE (FIGURE 2.21) fsa(S,L) :- L = [C|M], L = [C|M],transition(S,C,T),fsa(T,M). fsa(y,[]) :-. fsa(y,[]) :- end_state(E).

6 6 NDFSA → (D)FSA [ discussed at the end of section 2.2 in the textbook] construct a new machine –each state of the new machine represents the set of possible states of the original machine when stepping through the input Note: –new machine is equivalent to old one (but has more states) –new machine is deterministic example sx z a a a b y b b a b s{x,y} {z} a a a {y,z} b a {y} b a b b

7 7 ε-transitions jump from state to another state with the empty character –ε-transition (textbook) or λ-transition –no increase in expressive power examples a ε b > a b b > a ε b > what’s the equivalent without the ε-transition?

8 8 Start State(s) Finite State Automata (FSA) –(Q,s,f,Σ,  ) 1.set of states (Q): {s,x,y} must be a finite set 2.start state (s): s 3.end state(s) (f): y 4.alphabet ( Σ ): {a, b} 5.transition function  : signature: character × state → state  (a,s)=x  (a,x)=x  (b,x)=y  (b,y)=y sx y a a b b >

9 9 FSA Properties FSAs (and thus regular languages) are preserved, i.e. maintain their FSA nature, under... –concatenation –union –intersection –complementation –and other operations... –[see section 2.3 of textbook]

10 10 concatenation concatenate two FSAs, result is a FSA –trick: use ε-transitions to link the automatons example –[figure 2.24]

11 11 union disjunction (union) of two FSAs, result is a FSA –trick: use ε-transitions to link the automatons example –[figure 2.26]

12 12 intersection (conjunction) intersect two FSAs, result is a FSA –trick: use (modified) set-of-states construction example s1s1 xy a ab b s2s2 z b ab {s 1,s 2 } a {x,s 2 } a {y,z} b b look familiar? that’s because a + b * ∩ a * b + = a + b +

13 13 complementation (complementation) the negation or opposite FSA –with respect to Σ * the set of all possible strings from the alphabet –i.e. accepts everything original FSA rejects –and rejects everything original FSA accepts –result is still a FSA

14 14 Limits of Finite State Technology Language = set of strings case 1 –suppose set is finite –e.g. L = {ba, abc, ccb, dd} easy to encode as a FSA (by closure under union) case 2 –set is infinite –... s1s1 s2s2 s3s3 ab s1s1 s2s2 s3s3 ba s4s4 c s1s1 s2s2 s3s3 cc s4s4 b s1s1 s2s2 s3s3 dd s0s0 ε ε ε ε

15 15 Limits of Finite State Technology Language = set of strings case 2 –set is infinite –e.g. L = a + b + = { ab, aab, abb, aabb, aaab, abbb, … } “ one or more a ’ s followed by one or more b ’ s ” we know this set is regular –however, consider L = {a n b n | n ≥ 1} = { ab, aabb, aaabbb, … } “ same number of b ’ s as a ’ s …” this set is not regular. Why? sx y a a b b

16 16 The Limits of Finite State Technology [Formally, we can use the Pumping Lemma to prove this particular case.] informally, –we can build FSA for … –ab –aabb –aaabbb –… ab aabb aaabbb = end state

17 17 The Limits of Finite State Technology we can merge the individual FSA for … –ab –aabb –aaabbb aaabbb bb b such direct encoding would require an infinite number of states –and we ’ re using Finite State Automata quite different from the infinity obtained by looping –freely iterate (no counting)

18 18 The Limits of Finite State Technology example –L = a + b + = { ab, abb, aab, aabb, aaab, abbb, … } –“ one or more a ’ s followed by one or more b ’ s ” Note: –can be divided into two independent halves –each half can be replaced by iteration s1s1 s2s2 s3s3 ba s1s1 s2s2 s3s3 aa s4s4 b s1s1 s2s2 s3s3 ba s4s4 b s1s1 s2s2 s3s3 aa s4s4 b s5s5 b s1s1 s2s2 s3s3 aa s4s4 a s5s5 b s1s1 s2s2 s3s3 ba s4s4 b s5s5 b

19 19 The Limits of Finite State Technology example –L = a + b + = { ab, abb, aab, aabb, aaab, abbb, … } –“ one or more a ’ s followed by one or more b ’ s ” Note: –can be divided into two independent halves –each half can be replaced by iteration s1s1 s2s2 s3s3 ba s1s1 s2s2 s3s3 aa s4s4 b s1s1 s2s2 s3s3 ba s4s4 b s1s1 s2s2 s3s3 aa s4s4 b s5s5 b s1s1 s2s2 s3s3 aa s4s4 a s5s5 b s1s1 s2s2 s3s3 ba s4s4 b s5s5 b s1s1 s2s2 s3s3 ba s4s4 b s1s1 s2s2 s3s3 aa s4s4 b s5s5 b s0s0 ε ε s1s1 s2s2 s3s3 aa s4s4 a s5s5 b s6s6 b s0s0 ε ε s1s1 s2s2 s3s3 aa s4s4 a s5s5 b s6s6 bb s7s7 s1s1 s2s2 s3s3 aa s4s4 a s5s5 b b s3s3 s4s4 a s5s5 b b a


Download ppt "LING 438/538 Computational Linguistics Sandiway Fong Lecture 11: 10/3."

Similar presentations


Ads by Google