LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 10: 9/27.

Slides:



Advertisements
Similar presentations
LING/C SC/PSYC 438/538 Lecture 11 Sandiway Fong. Administrivia Homework 3 graded.
Advertisements

Theory Of Automata By Dr. MM Alam
LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 13: 10/9.
LING 388: Language and Computers Sandiway Fong Lecture 9: 9/27.
LING 388: Language and Computers Sandiway Fong 9/29 Lecture 11.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 8: 9/29.
LING 388: Language and Computers Sandiway Fong Lecture 9: 9/22.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 7: 9/12.
LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 12: 10/4.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 9: 9/21.
LING 388: Language and Computers Sandiway Fong Lecture 15: 10/17.
LING 388: Language and Computers Sandiway Fong Lecture 21: 11/7.
LING 388: Language and Computers Sandiway Fong Lecture 12: 10/5.
LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 8: 9/18.
LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 7: 9/11.
LING 388 Language and Computers Lecture 8 9/25/03 Sandiway FONG.
LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 9: 9/25.
LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 6: 9/6.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 11: 10/3.
LING 388: Language and Computers Sandiway Fong Lecture 6: 9/13.
LING 388: Language and Computers Sandiway Fong Lecture 11: 10/3.
LING 388 Language and Computers Take-Home Final Examination 12/9/03 Sandiway FONG.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 12: 10/5.
CS5371 Theory of Computation Lecture 6: Automata Theory IV (Regular Expression = NFA = DFA)
79 Regular Expression Regular expressions over an alphabet  are defined recursively as follows. (1) Ø, which denotes the empty set, is a regular expression.
LING 388 Language and Computers Lecture 11 10/7/03 Sandiway FONG.
LING 388: Language and Computers Sandiway Fong Lecture 10: 9/26.
LING 388 Language and Computers Lecture 7 9/23/03 Sandiway FONG.
LING 388: Language and Computers Sandiway Fong Lecture 16: 10/19.
LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 11: 10/2.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 5: 9/5.
LING 388: Language and Computers Sandiway Fong Lecture 17: 10/24.
LING 388 Language and Computers Lecture 12 10/9/03 Sandiway FONG.
LING 388 Language and Computers Lecture 6 9/18/03 Sandiway FONG.
LING/C SC/PSYC 438/538 Lecture 19 Sandiway Fong 1.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 7 Mälardalen University 2010.
LING 388: Language and Computers Sandiway Fong Lecture 14 10/11.
Computational Linguistics Yoad Winter *General overview *Examples: Transducers; Stanford Parser; Google Translate; Word-Sense Disambiguation * Finite State.
LING/C SC/PSYC 438/538 Lecture 14 Sandiway Fong. Administrivia Midterm – This Wednesday – A bit like doing a homework in real time – Bring your laptop.
::ICS 804:: Theory of Computation - Ibrahim Otieno SCI/ICT Building Rm. G15.
Pushdown Automata (PDAs)
Design contex-free grammars that generate: L 1 = { u v : u ∈ {a,b}*, v ∈ {a, c}*, and |u| ≤ |v| ≤ 3 |u| }. L 2 = { a p b q c p a r b 2r : p, q, r ≥ 0 }
Grammars CPSC 5135.
LING/C SC/PSYC 438/538 Lecture 7 9/15 Sandiway Fong.
LING 388: Language and Computers Sandiway Fong Lecture 6: 9/15.
LING/C SC/PSYC 438/538 Lecture 12 10/4 Sandiway Fong.
Regular Grammars Chapter 7. Regular Grammars A regular grammar G is a quadruple (V, , R, S), where: ● V is the rule alphabet, which contains nonterminals.
Context Free Grammars.
2. Regular Expressions and Automata 2007 년 3 월 31 일 인공지능 연구실 이경택 Text: Speech and Language Processing Page.33 ~ 56.
LING 388: Language and Computers Sandiway Fong Lecture 11: 10/4.
LING 388: Language and Computers Sandiway Fong 9/27 Lecture 10.
LING/C SC/PSYC 438/538 Lecture 13 Sandiway Fong. Administrivia Reading Homework – Chapter 3 of JM: Words and Transducers.
Natural Language Processing Lecture 4 : Regular Expressions and Automata.
LING/C SC/PSYC 438/538 Lecture 15 Sandiway Fong. Did you install SWI Prolog?
CSC312 Automata Theory Lecture # 26 Chapter # 12 by Cohen Context Free Grammars.
Regular Grammars Reading: 3.3. What we know so far…  FSA = Regular Language  Regular Expression describes a Regular Language  Every Regular Language.
LING/C SC/PSYC 438/538 Lecture 16 Sandiway Fong. SWI Prolog Grammar rules are translated when the program is loaded into Prolog rules. Solves the mystery.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2007.
LING/C SC/PSYC 438/538 Lecture 19 Sandiway Fong 1.
LING/C SC/PSYC 438/538 Lecture 17 Sandiway Fong. Last Time Talked about: – 1. Declarative (logical) reading of grammar rules – 2. Prolog query: s(String,[]).
Lecture 11  2004 SDU Lecture7 Pushdown Automaton.
Lecture 17: Theory of Automata:2014 Context Free Grammars.
WELCOME TO A JOURNEY TO CS419 Dr. Hussien Sharaf Dr. Mohammad Nassef Department of Computer Science, Faculty of Computers and Information, Cairo University.
LING/C SC/PSYC 438/538 Lecture 11 Sandiway Fong.
LING/C SC/PSYC 438/538 Lecture 21 Sandiway Fong.
LING/C SC/PSYC 438/538 Lecture 17 Sandiway Fong.
Chapter 7 Regular Grammars
LING/C SC/PSYC 438/538 Lecture 21 Sandiway Fong.
LING/C SC/PSYC 438/538 Lecture 22 Sandiway Fong.
LING/C SC/PSYC 438/538 Lecture 17 Sandiway Fong.
Presentation transcript:

LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 10: 9/27

Today’s Topics Homework 2 review Recap: DCG system New topic: Finite State Automata (FSA)

Homework 2 Review Question 1 –Consider a language L 3or5 = {111, 11111, , , , , ,...} –each member of L 3or5 is a string containing only 1s –the number of 1s in each string is divisible by 3 or 5 Give a regular grammar that generates language L 3or5

Homework 2 Review Regular grammar: –divisible by 3 side 1.s --> [1], a. 2.a --> [1], b. 3.b --> [1]. 4.b --> [1], c. 5.c --> [1], d. 6.d --> [1], b. Regular grammar: –divisible by 5 side 7.s --> [1], one. 8.one --> [1], two. 9.two --> [1], three. 10.three --> [1], four. 11.four --> [1]. 12.four --> [1], five. 13.five --> [1], six. 14.six --> [1], seven. 15.seven --> [1], eight. 16.eight --> [1], four. take the disjunction of the two sub-grammars, i.e. merge them

Homework 2 Review Question 2: Language L = {a 2n b n+1 | n ≥1} is also non-regular but can be generated by a regular grammar extended to allow left and right recursive rules Given a Prolog grammar satisfying these rules for L Legit rules: X -> aY X -> a X -> Ya –where X, Y are non- terminals, a is some arbitirary terminal

Homework 2 Review L = {a 2n b n+1 | n≥ 1} DCG: 1.s --> [a], b. 2.b --> [a], c. 3.c --> [b], d. 4.c --> s, [b]. 5.d --> [b]. generated by rules generated by rules (center-embedded recursive tree) d b b b s c a a bs b s c a a

Homework 2 Review Question 3 (Optional 438) A right recursive regular grammar that generates a (rightmost) parse for the language {a n | n>=2}: s(s(a,B)) --> [a], b(B). b(b(a,B)) --> [a], b(B). b(b(a))--> [a]. Example ?- s(Tree,[a,a,a],[]). Tree = s(a,b(a,b(a)))? A corresponding left recursive regular grammar will not halt in all cases when an input string is supplied Modify the right recursive grammar to produce a left recursive parse, e.g. ?- s(Tree,[a,a,a],[]). Tree = s(b(b(a),a),a) Can you do this for any right recursive regular grammar? e.g. the one for sheeptalk

Homework 2 Review Original: s(s(a,B)) --> [a], b(B). b(b(a,B)) --> [a], b(B). b(b(a))--> [a]. New: 1.s(s(B,a)) --> [a], b(B). 2.b(b(B,a)) --> [a], b(B). 3.b(b(a))--> [a]. taking advantage of the fact that we know we’re generating only a’s example: in rule 1, we match a terminal a on the left side but place an a at the end for the tree

Extra Arguments: Recap some uses for extra arguments on non-terminals in the grammar: 1.to generate a parse tree recording the derivation history 2.feature agreement 3.implement counting

Extra Arguments: Recap Implement Determiner-Noun Number Agreement Data: –the man/men –a man/*a men Modified grammar: np(np(D,N)) --> det(D,Number), common_noun(N,Number). det(det(the),_) --> [the]. det(det(a),sg) --> [a]. common_noun(n(ball),sg) --> [ball]. common_noun(n(man),sg) --> [man]. common_noun(n(men),pl) --> [men]. np(np(D,N)) --> detsg(D), common_nounsg(N). np(np(D,N)) --> detpl(D), common_nounpl(N). detsg(det(a)) --> [a]. detsg(det(the)) --> [the]. detpl(det(the)) --> [the]. common_nounsg(n(man)) -->[man]. common_nounpl(n(men)) --> [men]. syntactic sugar could just encode feature values in nonterminal name

Extra Arguments: Recap Class Exercise: –Implement Case = {nom,acc} agreement system for the grammar –Examples: the man hit me vs. *the man hit I Modified grammar: s(Y,Z)) --> np(Y,Case), vp(Z), { Case = nom }. np(np(Y),Case) --> pronoun(Y,Case). pronoun(i,nom) --> [i]. pronoun(we,nom) --> [we]. pronoun(me,acc) --> [me]. np(np(D,N),_) --> det(D,Num), common_noun(N,Num). vp(vp(Y,Z)) --> transitive(Y), np(Z,Case), { Case = acc }. {... Prolog code... }

Extra Arguments: Recap Class Exercise: –Implement Case = {nom,acc} agreement system for the grammar –Examples: the man hit me vs. *the man hit I Modified grammar: s(Y,Z)) --> np(Y,nom), vp(Z). np(np(Y),Case) --> pronoun(Y,Case). pronoun(i,nom) --> [i]. pronoun(we,nom) --> [we]. pronoun(me,acc) --> [me]. np(np(D,N),_) --> det(D,Num), common_noun(N,Num). vp(vp(Y,Z)) --> transitive(Y), np(Z,acc). without {... Prolog code... }

Extra Arguments: Recap 1.s --> [a],b. 2.b --> [a],b. 3.b --> [b],c. 4.b --> [b]. 5.c --> [b],c. 6.c --> [b]. L = a + b + a regular grammar L ab = { a n b n | n ≥ 1 } regular grammar + extra argument 1.s(X) --> [a],b(a(X)). 2.b(X) --> [a],b(a(X)). 3.b(a(X)) --> [b],c(X). 4.b(a(0)) --> [b]. 5.c(a(X)) --> [b],c(X). 6.c(a(0)) --> [b]. Query: ?- s(0,L,[]).

Extra Arguments: Recap 1.s(X) --> [a],b(a(X)). 2.b(X) --> [a],b(a(X)). 3.b(a(X)) --> [b],c(X). 4.b(a(0)) --> [b]. 5.c(a(X)) --> [b],c(X). 6.c(a(0)) --> [b]. derivation tree: s(0) rule 1 logic: ?- s(0,L,[]). start with 0 wrap an a(_) for each “a” unwrap a(_) for each “b” match #a’s with #b’s = counting a b(a(0)) rule 2 a b(a(a(0))) rule 3 b b c(a(0)) rule 6

So far Equivalent formalisms regexp regular grammars FSA

example (sheeptalk) –baa! –baaa! … regexp –baa+! wx z a ! y a a > s b points to start state end state marked in red basic idea: “just follow the arrows” state state transition: in state s, if current input symbol is b go to state w accepting computation: in a final state, if there is no more input, accept string

FSA: Construction step-by-step regexp –baa+! = –baaa*! s >

FSA: Construction step-by-step regexp –baaa*! –b –from state s, –see a ‘b’, –move to state w sw b >

FSA: Construction step-by-step regexp –baaa*! –ba –from state w, –see an ‘a’, –move to state x sw b x a >

FSA: Construction step-by-step regexp –baaa*! –baa –from state x, –see an ‘a’, –move to state y sw b x a > y a

FSA: Construction step-by-step regexp –baaa*! –baaa* –baa –baaa –baaaa... –from y, –see an ‘a’, –move to ? sw b x a > y y’ a a a... but machine must have a finite number of states!

Regular Expressions: Example step-by-step regexp –baaa*! –baa* –baa –baaa –baaaa... –from state y, –see an ‘a’, –“loop”, i.e. move back to, state y sw b x a a > y a

Regular Expressions: Example step-by-step regexp –baaa*! –from state y, –see an ‘!’, –move to final state z (indicated in red) Note: machine cannot finish (i.e. reach the end of the input string) in states s, w, x or y sw b x a a > y a z !

Finite State Automata (FSA) construction –the step-by-step FSA construction method we just used –works for any regular expression conclusion –anything we can encode with a regular expression, we can build a FSA for it –an important step in showing that FSA and regexps are formally equivalent

regexp operators and FSA basic wildcards –. and *. any single character e.g. p.t put, pit, pat, pet * zero or more characters xy d a b c e z etc.... y a etc. one loop for each character over the whole alphabet

regexp operators and FSA basic wildcards –+ one or more of the preceding character e.g. a+ –[ ] range of characters e.g. [aeiou] xy a a xy o a e i u

regexp operators and FSA basic wildcards –?–? zero or one of the preceding character e.g. a? Non-determinism –any FSA with an empty transition is non-deterministic –see example: could be in state x or y simultaneous –any FSA with an empty transition can be rewritten without the empty transition xy a λ λ denotes the empty transition xy a >

regexp operators and FSA backreferences: –by state splitting –e.g. ([aeiou])\1 xy o a e i u yz o a e i u x yuyu o a e i u z yaya yeye yiyi yoyo o a e u

Finite State Automata (FSA) more formally –(Q,s,f,Σ,  ) 1.set of states (Q): {s,w,x,y,z} 4 statesmust be a finite set 2.start state (s): s 3.end state(s) (f): z 4.alphabet ( Σ ): {a, b, !} 5.transition function  : signature: character × state → state  (b,s)=w  (a,w)=x  (a,x)=y  (a,y)=y  (!,y)=z sw b x a a > y a z !

FSA Finite State Automata (FSA) have a limited amount of expressive power Let’s look at a modification to FSA and its effect on its power

String Transitions –so far... all machines have had just a single character label on the arc so if we allow strings to label arcs –do they endow the FSA with any more power? b Answer: No –because we can always convert a machine with string-transitions into one without abb abb

Finite State Automata (FSA) equivalent s z baa ! y a > 5 state machine sw b x a a > y a z ! 3 state machine using string transitions