Presentation is loading. Please wait.

Presentation is loading. Please wait.

10/8/2015CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.

Similar presentations


Presentation on theme: "10/8/2015CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini."— Presentation transcript:

1 10/8/2015CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini

2 10/8/2015CPSC503 Winter 20082 Today Sep 10 Subscribe to mailing list cpsc503 (majordomo) Introductions Brief check of some background knowledge English Morphology FSA and Morphology Start: Finite State Transducers (FST) and Morphological Parsing/Gen.

3 10/8/2015CPSC503 Winter 20083 Introductions Your Name Previous experience in NLP? Why are you interested in NLP? Are you thinking of NLP as your main research area? If not, what else do you want to specialize in…. Anything else…………

4 10/8/2015CPSC503 Winter 20084 Today Sep 10 Subscribe to mailing list cpsc503 (majordomo) Introductions Brief check of some background knowledge English Morphology FSA and Morphology Start: Finite State Transducers (FST) and Morphological Parsing/Gen.

5 10/8/2015CPSC503 Winter 20085 Knowledge-Formalisms Map (including some probabilistic formalisms) Logical formalisms (First-Order Logics) Rule systems (and prob. versions) (e.g., (Prob.) Context-Free Grammars) State Machines (and prob. versions) (Finite State Automata,Finite State Transducers, Markov Models) Morphology Syntax Pragmatics Discourse and Dialogue Semantics AI planners

6 10/8/2015CPSC503 Winter 20086 Next Two Lectures State Machines (no prob.) Finite State Automata (and Regular Expressions) Finite State Transducers (English) Morphology Logical formalisms (First-Order Logics) Rule systems (and prob. version) (e.g., (Prob.) Context-Free Grammars) Syntax Pragmatics Discourse and Dialogue Semantics AI planners

7 10/8/2015CPSC503 Winter 20087 ?? baaa !\ 0 123465 baba !\ 0 123465

8 10/8/2015CPSC503 Winter 20088 ?? /CPSC50[34]/ /^([Ff]rom\b|[Ss]ubject\b|[Dd]ate\b)/ /[0-9]+(\.[0-9]+){3}/

9 10/8/2015CPSC503 Winter 20089 Example of Usage: Text Searching/Editing Find me all instances of the determiner “the” in an English text. –To count them –To substitute them with something else You try: /the/ /[tT]he//\bthe\b/ /\b[tT]he\b/ The other cop went to the bank but there were no people there. s/\b([tT]he|[Aa]n?)\b/DET/

10 10/8/2015CPSC503 Winter 200810 Fundamental Relations FSA Regular Expressions Many Linguistic Phenomena model implement (generate and recognize) describe

11 10/8/2015CPSC503 Winter 200811 Next Two Lectures State Machines (no prob.) Finite State Automata (and Regular Expressions) Finite State Transducers (English) Morphology Logical formalisms (First-Order Logics) Rule systems (and prob. version) (e.g., (Prob.) Context-Free Grammars) Syntax Pragmatics Discourse and Dialogue Semantics AI planners

12 10/8/2015CPSC503 Winter 200812 English Morphology We can usefully divide morphemes into two classes –Stems: The core meaning bearing units –Affixes: Bits and pieces that adhere to stems to change their meanings and grammatical functions Def. The study of how words are formed from minimal meaning-bearing units (morphemes) Examples: unhappily, ……………

13 10/8/2015CPSC503 Winter 200813 Word Classes For now word classes: nouns, verbs, adjectives and adverbs. We’ll go into the gory details in Ch 5 Word class determines to a large degree the way that stems and affixes combine

14 10/8/2015CPSC503 Winter 200814 English Morphology We can also divide morphology up into two broad classes –Inflectional –Derivational

15 10/8/2015CPSC503 Winter 200815 Inflectional Morphology The resulting word: –Has the same word class as the original –Serves a grammatical/semantic purpose different from the original

16 10/8/2015CPSC503 Winter 200816 Nouns, Verbs and Adjectives (English) Nouns are simple (not really) –Markers for plural and possessive Verbs are only slightly more complex –Markers appropriate to the tense of the verb and to the person Adjectives –Markers for comparative and superlative

17 10/8/2015CPSC503 Winter 200817 Regulars and Irregulars Some words misbehave (refuse to follow the rules) –Mouse/mice, goose/geese, ox/oxen –Go/went, fly/flew Regulars… –Walk, walks, walking, walked, walked Irregulars –Eat, eats, eating, ate, eaten –Catch, catches, catching, caught, caught –Cut, cuts, cutting, cut, cut

18 10/8/2015CPSC503 Winter 200818 Derivational Morphology Derivational morphology is the messy stuff that no one ever taught you. –Changes of word class –Less Productive ( -ant V -> N only with V of Latin origin!)

19 10/8/2015CPSC503 Winter 200819 Derivational Examples Verb/Adj to Noun -ationcomputerizecomputerization -eeappointappointee -erkillkiller -nessfuzzyfuzziness

20 10/8/2015CPSC503 Winter 200820 Derivational Examples Noun/Verb to Adj -alComputationComputational -ableEmbraceEmbraceable -lessClueClueless

21 10/8/2015CPSC503 Winter 200821 Compute Many paths are possible… Start with compute –Computer -> computerize -> computerization –Computation -> computational –Computer -> computerize -> computerizable –Compute -> computee

22 10/8/2015CPSC503 Winter 200822 Summary State Machines (no prob.) Finite State Automata (and Regular Expressions) Finite State Transducers (English) Morphology Logical formalisms (First-Order Logics) Rule systems (and prob. version) (e.g., (Prob.) Context-Free Grammars) Syntax Pragmatics Discourse and Dialogue Semantics AI planners

23 10/8/2015CPSC503 Winter 200823 FSAs and Morphology GOAL1: recognize whether a string is an English word PLAN: 1.First we’ll capture the morphotactics (the rules governing the ordering of affixes in a language) 2.Then we’ll add in the actual stems

24 10/8/2015CPSC503 Winter 200824 FSA for Portion of N Inflectional Morphology

25 10/8/2015CPSC503 Winter 200825 Adding the Stems But it does not express that: Reg nouns ending in –s, -z, -sh, -ch, -x -> es (kiss, waltz, bush, rich, box) Reg nouns ending –y preceded by a consonant change the –y to -i

26 10/8/2015CPSC503 Winter 200826 Small Fragment of V and N Derivational Morphology [noun i ] eg. hospital [adj al ] eg. formal [adj ous ] eg. arduous [verb j ] eg. speculate [verb k ] eg. conserve

27 10/8/2015CPSC503 Winter 200827 GOAL2: Morphological Parsing/Generation (vs. Recognition) Recognition is usually not quite what we need. –Usually given a word we need to find: the stem and its class and morphological features (parsing) –Or we have a stem and its class and morphological features and we want to produce the word (production/generation) Examples (parsing) –From “ cats” to “ cat +N +PL” –From “lies” to ……

28 10/8/2015CPSC503 Winter 200828 Computational problems in Morphology Recognition: recognize whether a string is an English word (FSA) Parsing/Generation: word stem, class, lexical features …. lies lie +N +PL lie +V +3SG Stemming: word stem …. e.g.,

29 10/8/2015CPSC503 Winter 200829 Finite State Transducers FSA cannot help…. The simple story –Add another tape –Add extra symbols to the transitions –On one tape we read “ cats ”, on the other we write “ cat +N +PL ”

30 10/8/2015CPSC503 Winter 200830 FSTs generationparsing

31 10/8/2015CPSC503 Winter 200831 (Simplified) FST formal definition (you can skip 3.4.1) Q: a finite set of states I,O: input and an output alphabets (which may include ε) Σ: a finite alphabet of complex symbols i:o, i  I and o  O Q 0: the start state F: a set of accept/final states (F  Q) A transition relation δ that maps QxΣ to 2 Q

32 10/8/2015CPSC503 Winter 200832 FST can be used as… Translators: input one string from I, output another from O (or vice versa) Recognizers: input a string from IxO Generator: output a string from IxO

33 10/8/2015CPSC503 Winter 200833 Simple Example Transitions (as a translator): c:c means read a c on one tape and write a c on the other (or vice versa) +N:ε means read a +N symbol on one tape and write nothing on the other (or vice versa) +PL:s means read +PL and write an s (or vice versa) c:ca:at:t +N:ε +PL:s +SG: ε

34 Examples (as a translator) cats +N +SG cat lexical surface generation parsing c:ca:at:t +N:ε +PL:s +SG: ε 10/8/201534 CPSC503 Winter 2008

35 10/8/2015CPSC503 Winter 200835 More complex Example Transitions (as a translator): l:l means read an l on one tape and write an l on the other (or vice versa) +N:ε means read a +N symbol on one tape and write nothing on the other (or vice versa) +PL:s means read +PL and write an s (or vice versa) … +3SG:s l:li:ie:e +N:ε +PL:s +V:ε q1q1 q0q0 q2q2 q3q3 q4q4 q5q5 q6q6 q7q7

36 Examples (as a translator) lies +V+3SGlie lexical surface generation parsing +3SG:s l:li:ie:e +N:ε +PL:s +V:ε q1q1 q0q0 q2q2 q3q3 q4q4 q5q5 q6q6 q7q7 10/8/2015 36 CPSC503 Winter 2008

37 Examples (as a recognizer and a generator) lies +V+3SGlie lexical surface +3SG:s l:li:ie:e +N:ε +PL:s +V:ε q1q1 q0q0 q2q2 q3q3 q4q4 q5q5 q6q6 q7q7 10/8/201537CPSC503 Winter 2008

38 10/8/2015CPSC503 Winter 200838 Next Time Finish FST and morphological analysis Porter Stemmer Read Chp. 3 up to 3.10 excluded (def. of FST: understand the one on slides) (3.4.1 optional)


Download ppt "10/8/2015CPSC503 Winter 20081 CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini."

Similar presentations


Ads by Google