Presentation is loading. Please wait.

Presentation is loading. Please wait.

LING 438/538 Computational Linguistics Sandiway Fong Lecture 8: 9/29.

Similar presentations


Presentation on theme: "LING 438/538 Computational Linguistics Sandiway Fong Lecture 8: 9/29."— Presentation transcript:

1 LING 438/538 Computational Linguistics Sandiway Fong Lecture 8: 9/29

2 Administrivia reminder –homework 2 due tonight

3 Last Time regular grammars –aka Chomsky hierarchy type-3 grammars –are formal grammars with severe restrictions on what can appear on the RHS –are limited in generative capacity or power –in Prolog DCG notation: x --> y, [t]. x --> [t]. (left recursive variant) or x --> [t],y. x --> [t]. (right recursive variant) –can ’ t have both left and right recursive rules in the same grammar

4 Last Time regular grammars examples regular languages –“ one or more a ’ s followed by one or more b ’ s ” –sheeptalk {ba!, baa!, baaa!,...} i.e. –can be encoded by a regular grammar beyond regular grammars examples –a n b n = –{ab, aabb, aaabbb,... } –ww R : where w  {a,b} + –i.e. any non-empty sequence of a’s and b’s informal idea about the crucial difference “needing to keep track of history”

5 Today’s Topic Finite State Automata –plus more on what it means to be a regular language Merge Point –Textbook – Chapter 2: Regular Expressions and Automata

6 + left & right recursive rules Today’s Topic Finite State Automata –plus more on what it means to be a regular language formally equivalent – in terms of generative capacity or power Regular Grammars FSA Regular Expressions Regular Languages

7 Some Regular Expression Notation... some notation first (more on regexps next time) Regular Expressions (regexp) shorthand for describing sets of strings Operators: –string + set of one or more occurrences of string a + = {a, aa, aaa, aaaa, aaaaa, …} (abc) + = {abc, abcabc, abcabcabc, …} –Note: parentheses used to delimit the scope of the operator –string * set of zero or more occurrences of string a * = {, a, aa, aaa, aaaa, …} (abc) * = {, abc, abcabc, …} –Note:  - zero length string

8 Some Regular Expression Notation... some notation first Relation between * and + –a a * = a + –“a concatenated with a*” –a {, a, aa, aaa, aaaa, …} = {a, aa, aaa, aaaa, aaaaa, …} Operators: –string n exactly n occurrences of string a 4 b 3 = { aaaabbb } Language = a set of strings

9 Regular Expressions regular expressions –formally equivalent to regular grammars and finite state automata How to show this? Proof by construction… beyond regular expressions –examples {a n b n | n>0} is not regular {ww R | w  {a,b} + } is not regular, e.g. (abc) R = cba –How to show this? –Proof by Pumping Lemma Regular Grammars FSA Regular Expressions

10 Regular Expressions Example: –Language: L = {a + b + } “one or more a’s followed by one or more b’s” regular language –described by a regular expression Note: –infinite set of strings belonging to language L »e.g. abbb, aaaab, aabb, *abab, * Notation: –  is the empty string (or string with zero length) –* means string is not in the language regular grammar s --> [a],b. b --> [a],b. b --> [b],c. b --> [b]. c --> [b],c. c --> [b].

11 Finite State Automata (FSA) sx y a a b b L = {a + b + } L = {aa * bb * } deterministic FSA (DFSA) no ambiguity about where to go at any given state non-deterministic FSA (NDFSA) no restriction on ambiguity (surprisingly, no increase in power)

12 Finite State Automata (FSA) more formally –(Q,s,f,Σ,  ) 1.set of states (Q): {s,x,y}must be a finite set 2.start state (s): s 3.end state(s) (f): y 4.alphabet ( Σ ): {a, b} 5.transition function  : signature: character × state → state  (a,s)=x  (a,x)=x  (b,x)=y  (b,y)=y sx y a a b b

13 Finite State Automata (FSA) practical applications can be encoded and run efficiently on a computer widely used –encode regular expressions –compress large dictionaries –morphological analyzers Different word forms, e.g. want, wanted, unwanted (suffixation/prefixation) see chapter 3 of textbook speech recognizers Markov models = FSA + probabilities and many more …

14 Finite State Automata (FSA) how: 3 vs. 6 keystrokes michael: 7 vs. 15 keystrokes –T9 text entry (tegic.com) built in to your cellphone predictive text entry for mobile messaging/data entry reduces the number of keystrokes for inputting words on a telephone keypad (8 keys)

15 RegExp  FSA From Regular Expression to FSA Operators –asingle symbol a –a n n occurrences of a –a –a n a 3  a a aa

16 RegExp  FSA Operators –a * zero or more occurrences of a –a + one or more occurrences of a –a * –a + a + = aa * a a a

17 Regular Grammar  FSA examples –s --> [a], t. –x --> [a], x. –x --> [a]. a st a x a x final state y

18 Next Time Prolog and FSA


Download ppt "LING 438/538 Computational Linguistics Sandiway Fong Lecture 8: 9/29."

Similar presentations


Ads by Google