Presentation is loading. Please wait.

Presentation is loading. Please wait.

Finite State LanguagesCSE 140 - Intro to Cognitive Science1 The Computational Modeling of Language: Finite State Languages Lecture I: Slides 1-21 Lecture.

Similar presentations


Presentation on theme: "Finite State LanguagesCSE 140 - Intro to Cognitive Science1 The Computational Modeling of Language: Finite State Languages Lecture I: Slides 1-21 Lecture."— Presentation transcript:

1 Finite State LanguagesCSE 140 - Intro to Cognitive Science1 The Computational Modeling of Language: Finite State Languages Lecture I: Slides 1-21 Lecture 2: Slides 22-…

2 Finite State LanguagesCSE 140 - Intro to Cognitive Science2 Language Exists at Many Levels Sounds Words Sentences (utterances) Discourse (text) Dialog Combined with other modalities Etc. We will focus on a formal account of the sentence level Provides formal account of grammaticality judgments Simple yet powerful models

3 Finite State LanguagesCSE 140 - Intro to Cognitive Science3 A Formal Theory  We will present a mathematical theory of language.  Because of time constraints we will be somewhat informal in introducing concepts, but EVERYTHING we present can be made completely rigorous, starting from definitions and proceeding through proofs.  Strategy: First, examples from English Then, more abstract examples

4 Finite State LanguagesCSE 140 - Intro to Cognitive Science4 The Set of Strings over an Alphabet  Given a finite alphabet, , the set of strings over  will be denoted by  *, including the null string  Let  = { all words of English} Then  * denotes all strings of words of English, including the empty (null) string  Only some of these strings are grammatical sentences of English Let  = {a, b}. Then  * denotes all strings of a's and b's, including the empty (null) string 

5 Finite State LanguagesCSE 140 - Intro to Cognitive Science5 A Language over  A language L over  is a subset of  * Let L E be the set of all grammatical sentences of English L E   * is a language over  = { all words of English} Sentences in L E : John likes applesApples like John Two is greater than fourThe black cat is on the mat …. (Notation:L   * means L is a subset of  * )

6 Finite State LanguagesCSE 140 - Intro to Cognitive Science6 Sentences not in L E The the the John like peanuts Every student hates any course The rat the cat the dog chased bit ate the cheese (?) etc.

7 Finite State LanguagesCSE 140 - Intro to Cognitive Science7 Another Language Over Another  L = { a m b n | m  1,n  2 } = Set of all strings of a's and b's such that All a's precede all b's and There is at least one a and There are at least two b’s -L is a language over {a, b} * (Notation: a 2 means aa, a 2 b 3 means aabbb) (Notation {x|y} means the set of all xs such that condition y is true of those xs)

8 Finite State LanguagesCSE 140 - Intro to Cognitive Science8 Infinite Language from Finite Models A language over  can be finite or infinite L E : the set of all grammatical sentences of English L E is potentially infinite Finite characterization of a potentially infinite set can often be alternatively modeled by: grammar characterization machine characterization behavioral characterization

9 Finite State LanguagesCSE 140 - Intro to Cognitive Science9 Road Map to the Reading! We will begin with Chapters 17 and 18, returning to the general characterization of grammars (Sections 16.4 and 16.5 ) later. (Skip Section 16.3 ) Chapter 17: machine characterization of languages with finite state machines equivalent grammar characterization equivalent ‘behavioral’ characterization in terms of terminal symbols only regular expressions

10 Finite State LanguagesCSE 140 - Intro to Cognitive Science10 Introduction to Finite State Automata Finite State Automata (FSAs) are characterized by:  States (circles), including initial and final states  A vocabulary  (here {the, a, big, very, book, poor})  Transitions between states (arrows) An FSA accepts a language L q0 the q1 book the poor a q2 q3 q4 big very

11 Finite State LanguagesCSE 140 - Intro to Cognitive Science11 Another Finite State Automaton  States: K = {q0, q1} Initial state: Q0 Final states: F = {q1}  A vocabulary  a, b  (Back to English on Slide 27…) qoq1 a b b

12 Finite State LanguagesCSE 140 - Intro to Cognitive Science12 Another Finite State Automaton II The arrows are a graphical representation of  the transition function:   q0, a  q0 q0, b  q1 q1, b  q1 qoq1 a b b

13 Finite State LanguagesCSE 140 - Intro to Cognitive Science13 A Formal Definition of an FSA M We can characterize a languages L   by FSA M = ( K,   q o, F) where K is the finite set of states  is the finite input alphabet q o  Kis the initial state F  Kis the set of final states  is the transition function for each state and each input symbol,  specifies the next state of the machine  Notation: a  means a is an element of the set A.)

14 Finite State LanguagesCSE 140 - Intro to Cognitive Science14 The Language Accepted by an FSA Given a FSA, M, the language (i.e., the set of strings accepted by M) is defined as follows: L(M) = { w | w  * and starting with q0 and following the transitions as specified by , M reaches one of the final states}

15 Finite State LanguagesCSE 140 - Intro to Cognitive Science15 Definition of Finite State Language A language L is a finite state language (fsl) or a regular language if there is a FSA, M such that the language of M is L.

16 Finite State LanguagesCSE 140 - Intro to Cognitive Science16 The Language Accepted by our FSA? L = any number of a’s (including none) followed by at least one b = { a n b m | n  0, m  1} = a* b b* (* here means any number of repetitions including none) qoq1 a b b

17 Finite State LanguagesCSE 140 - Intro to Cognitive Science17 Let’s Do an Example! Let  = {0, 1}. Let L = the set of all strings of 0’s and 1’s that contain exactly two 1’s. Show that L is a finite state language! First step: L is a finite state language if…. Second step: Define such an M (For other examples, see Exercise 3, Ch. 17)

18 Finite State LanguagesCSE 140 - Intro to Cognitive Science18 But  Isn’t a Function in Our Example! So far:  = {a,b}   q0, a  q0 q0, b  q1 q1, b  q1 q1, a  ? qoq1 a b b

19 Finite State LanguagesCSE 140 - Intro to Cognitive Science19 But  Isn’t a Function in Our Example! So far:  = {a,b}   q0, a  q0 q0, b  q1 q1, b  q1 q1, a  ? Needed: a dead state qoq1 a b b

20 Finite State LanguagesCSE 140 - Intro to Cognitive Science20 A Fully Specified FSA with Dead States  = {a,b}  q0, a  q0 q0, b  q1 q1, b  q1 q1, a  q2 q2, a  q2 q2, b  q2 qoq1 a b b q2 a a b

21 Finite State LanguagesCSE 140 - Intro to Cognitive Science21 A Taste of FSA Algebra: Complements Definition: The complement of L =    - L i.e. the set of strings in    not contained in L To find the complement of an FSL L: 1.Find fully specified FSA M such that L = L(M) 2.Switch the final and non-final states! So the complements of FSLs are FSLs!

22 Finite State LanguagesCSE 140 - Intro to Cognitive Science22 Complements: An Example Let  = {0, 1}. Let L = {a n b m | n  0, m  1} (our old friend) What’s the complement of L??

23 Finite State LanguagesCSE 140 - Intro to Cognitive Science23 1. Find … FSA M such that L = L(M)  = {a,b}  q0, a  q0 q0, b  q1 q1, b  q1 q1, a  q2 q2, a  q2 q2, b  q2 qoq1 a b b q2 a a b

24 Finite State LanguagesCSE 140 - Intro to Cognitive Science24 2. Switch Final and Non-final States  = {a,b}  q0, a  q0 q0, b  q1 q1, b  q1 q1, a  q2 q2, a  q2 q2, b  q2 qoq1 a b b q2 a a b

25 Finite State LanguagesCSE 140 - Intro to Cognitive Science25 Definition: A Deterministic FSA  = {a,b}  q0, a  q0 q0, b  q1 q1, b  q1 q1, a  q2 q2, a  q2 q2, b  q2 qoq1 a b b q2 a a b An FSA M is deterministic if  is a function, i.e. for each state and each input there is exactly one new state. The FSA’s we have considered since slide 11 are all deterministic FSAs (DFAs).

26 Finite State LanguagesCSE 140 - Intro to Cognitive Science26 Non-deterministic Finite Automata In a non-deterministic FSA (NFA), the transition relation  allows any number of new states for each state and each input. We will also allow transitions on no input (i.e., on the null string). A string w is accepted by a non-deterministic FSA if there is at least one state sequence (starting with the initial state) that will reach one of the final states. (Notation:  is the upper case version of 

27 Finite State LanguagesCSE 140 - Intro to Cognitive Science27 A Non-deterministic FSA for English… Simple noun phrases of English containing a determiner (DET) followed by a noun (N) the cat DET followed by an adjective (ADJ) the poor N only peanuts q0 DET q1 N DET ADJ  q2 q3 q4

28 Finite State LanguagesCSE 140 - Intro to Cognitive Science28 A Surprise! While NFAs are often convenient to use, it turns out that: For every FSA M such that M is non- deterministic there is a simple algorithm which will construct a FSA M’ such that M’ is deterministic and L(M’) = L(M) (If M’ accepts exactly the strings accepted by M, we say M’ is equivalent to M.) So NFAs are no more powerful than DFAs!!

29 Finite State LanguagesCSE 140 - Intro to Cognitive Science29 An Equivalent NFA and DFA  An NFA:  An equivalent DFA: q0 DET q1 N DET ADJ  q2 q3 q4 q’0 DET q’1 N q’2 q’4 q’5 N ADJ

30 Finite State LanguagesCSE 140 - Intro to Cognitive Science30 Back to Noun Phrases: ADJs How can we add: optional adjectives before the noun? the black cat, the beautiful black cat q0 DET q1 N DET ADJ  q2 q3 q4

31 Finite State LanguagesCSE 140 - Intro to Cognitive Science31 More About Noun Phrases: ADJs To add optional adjectives before the noun, add to  q1, ADJ  q1 the black cat, the beautiful black cat q0 DET q1 N DET ADJ  q2 q3 q4 ADJ

32 Finite State LanguagesCSE 140 - Intro to Cognitive Science32 More About Noun Phrases: ADVs How can we add: optional adverbs (ADV) on adjectives? the very old, the very very old q0 DET q1 N DET ADJ  q2 q3 q4 ADJ

33 Finite State LanguagesCSE 140 - Intro to Cognitive Science33 More About Noun Phrases: ADVs To add optional adverbs (ADV) on adjectives, add to  q3, ADV  q3 the very old, the very very old q0 DET q1 N DET ADJ  q2 q3 q4 ADJ ADV

34 Finite State LanguagesCSE 140 - Intro to Cognitive Science34 Bug! What about the very old cat?? We need to allow optional ADVs before the ADJ from q1 as well… q0 DET q1 N DET ADJ  q2 q3 q4 ADJ ADV

35 Finite State LanguagesCSE 140 - Intro to Cognitive Science35 Consistently adding ADVs before ADJs Why did we need to add an extra state, q5?? q0 DET q1 N DET ADJ  q2 q3 q4 ADV q5  ADV ADJ

36 Finite State LanguagesCSE 140 - Intro to Cognitive Science36 Adding Prepositional Phrase Modifiers A prepositional phrase (PP) consists of a preposition (P) like in, on, above, for, near followed by a noun phrase on the dirty old matin the very old box on the mantlefor the very poor PPs can also modify NPs the black cat on the dirty old mat the very old box on the mantle

37 Finite State LanguagesCSE 140 - Intro to Cognitive Science37 Extending our NFA for PP modifiers The cat The cat on the mat The cat on the mat by the door in the back q0 DET q1 N DET ADJ  q2 q3 q4 ADV q5 P P  ADV ADJ

38 Finite State LanguagesCSE 140 - Intro to Cognitive Science38 An NFA for Simple Sentences The dog chased the cat The young admire the old Foxes eat chickens Looks promising….. q0 DET q1 N DET ADJ  q2 q3 q4 q5 DET q6 N DET ADJ  q7 q8 q9 V V

39 Finite State LanguagesCSE 140 - Intro to Cognitive Science39 An NFA for Less Simple Sentences The very old man watched young brown puppies The very very poor want a good education The young puppies in the old brown box watched the cat in the corner Looks even better….. BUT …. q0 DET q1 N DET ADJ  q2 q3 q4 ADV q5  ADV ADJ q6 DET q7 N DET ADJ  q8 q9 q10 ADV q11  ADV ADJ V V

40 Finite State LanguagesCSE 140 - Intro to Cognitive Science40 Bug: The NFA “loses generalizations”  For these sentences, the Subject NP states and Object NP states are duplicates…  What would a FSA for “NP gave NP to NP” look like?  The FSA model loses the generalization that NPs are NPs are NPs.. q0 DET q1 N DET ADJ  q2 q3 q4 ADV q5  ADV ADJ q6 DET q7 N DET ADJ  q8 q9 q10 ADV q11  ADV ADJ V V

41 Finite State LanguagesCSE 140 - Intro to Cognitive Science41 Another FSA Bug: (17.3.2) More serious trouble: The cat died. (NP V) The cat the dog chased died. (NP NP V V) The cat the dog the rat bit chased died. (NP 3 V 3 ) The cat the dog the rat the elephant admired bit. chased died. (NP 4 V 4 ) These are all of the form NP n V n FSAs can’t generate these, as we’ll see next…

42 Finite State LanguagesCSE 140 - Intro to Cognitive Science42 A Language That Is Not an FSL Consider L = {a n b n | n  1 } i.e. L consists of all strings where There are an equal number of a’s and b’s, All a’s precede all b’s. L is not a fsl. FSA’s cannot count up to an arbitrary number! So English isn’t an fsl!!

43 Finite State LanguagesCSE 140 - Intro to Cognitive Science43 The Pumping Lemma for fsl’s (17.2.1) If L is fsl (regular) then for all sufficiently long strings w  L we have the following property: w = x u y i.e. w can be segmented into three parts, which we’ll call x, u, and y all strings of the form x u i y  L, (where u i means i copies of u, i  0 ) q0 q1q3 xy The loop involving u may include several states. u

44 Finite State LanguagesCSE 140 - Intro to Cognitive Science44 Showing L = {a n b n | n  1 } is not a fsl: The Pumping Lemma: If L is fsl then for all sufficiently long strings w  L: w = x u y all strings of the form x u i y  L. 1.Try locating the u segment in various places in the string a a... b b.. 2.In each case the string obtained by iterating u is not in L. 3.Hence, L is not a fsl.

45 Finite State LanguagesCSE 140 - Intro to Cognitive Science45 Characterizing fsl’s Using Grammars  Finite State Grammarsaka  Type 3 Grammarsaka  Right Linear Grammars  The languages generated by right linear grammars are exactly the grammars accepted by FSAs.

46 Finite State LanguagesCSE 140 - Intro to Cognitive Science46 An Informal Intro to FSGs S  John A A  likes B B  roasted C C  peanuts Derivation: S John A (= VP) likes B (= NP) roasted C (= N) peanuts

47 Finite State LanguagesCSE 140 - Intro to Cognitive Science47 Finite State Grammars A finite state grammar G = ( V T, V N, S, R) consists of V T the terminal vocabulary V N the non-terminal vocabulary Sthe start symbol, S  V N Ra finite set of rewrite rules (productions) The rewrite rules are of the following form A  a B where A, B  V N and a  V T A  a

48 Finite State LanguagesCSE 140 - Intro to Cognitive Science48 An Example FSG G = ( V T, V N, S, R) where V T = {John, roasted, peanuts, likes} V N = {S, A, B, C} R = {S  John A A  likes B B  roasted C C  peanuts}

49 Finite State LanguagesCSE 140 - Intro to Cognitive Science49 Derivations Derivation starts with S. Since the right hand side of a rule has at most one non- terminal there is only one non-terminal (if any) that can be rewritten at each step. Derivation stops when there no more non-terminals to be rewritten. L(G)= language derived by G= set of all strings of terminal strings derived in G starting from S.

50 Finite State LanguagesCSE 140 - Intro to Cognitive Science50 A Derivation in our Example FSG G = ( V T, V N, S, R) where V T = {John, roasted, peanuts, likes} V N = {S, A, B, C} R = {S  John A A  likes B B  roasted C C  peanuts} Derivation: S John A (= VP) likes B (= NP) roasted C (= N) peanuts

51 Finite State LanguagesCSE 140 - Intro to Cognitive Science51 The Equivalence of FSGs and fsa’s We can construct an FSG G given an FSA M: 1.Treat the states of M as the non-terminals (treat K as V N ). 2.Treat the vocabulary of M as the terminals. (treate  as V T ). 3.For transition from state A to state B on input symbol a create a rule A  a. 4.For a transition from state A to a final state of M on the input symbol a corresponds to the rule A  a.

52 Finite State LanguagesCSE 140 - Intro to Cognitive Science52 More to come….


Download ppt "Finite State LanguagesCSE 140 - Intro to Cognitive Science1 The Computational Modeling of Language: Finite State Languages Lecture I: Slides 1-21 Lecture."

Similar presentations


Ads by Google