CIS, Ludwig-Maximilians-Universität München Computational Morphology

Slides:



Advertisements
Similar presentations
Finite-state automata and Morphology
Advertisements

CS Morphological Parsing CS Parsing Taking a surface input and analyzing its components and underlying structure Morphological parsing:
Natural Language Processing Lecture 3—9/3/2013 Jim Martin.
Computational Morphology. Morphology S.Ananiadou2 Outline What is morphology? –Word structure –Types of morphological operation – Levels of affixation.
Morphological Analysis Chapter 3. Morphology Morpheme = "minimal meaning-bearing unit in a language" Morphology handles the formation of words by using.
Finite-State Transducers Shallow Processing Techniques for NLP Ling570 October 10, 2011.
Morphology, Phonology & FSTs Shallow Processing Techniques for NLP Ling570 October 12, 2011.
BİL711 Natural Language Processing1 Morphology Morphology is the study of the way words are built from smaller meaningful units called morphemes. We can.
6/2/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.
6/10/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 3 Giuseppe Carenini.
LIN3022 Natural Language Processing Lecture 3 Albert Gatt 1LIN3022 Natural Language Processing.
Computational language: week 9 Finish finite state machines FSA’s for modelling word structure Declarative language models knowledge representation and.
CSCI 5832 Natural Language Processing Lecture 5 Jim Martin.
1 Morphological analysis LING 570 Fei Xia Week 4: 10/15/07 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A.
Learning Bit by Bit Class 3 – Stemming and Tokenization.
Morphological analysis
CS 4705 Morphology: Words and their Parts CS 4705 Julia Hirschberg.
CS 4705 Lecture 3 Morphology: Parsing Words. What is morphology? The study of how words are composed from smaller, meaning-bearing units (morphemes) –Stems:
LING 438/538 Computational Linguistics Sandiway Fong Lecture 14: 10/12.
Finite State Transducers The machine model we will study for morphological parsing is called the finite state transducer (FST) An FST has two tapes –input.
CS 4705 Morphology: Words and their Parts CS 4705 Julia Hirschberg.
Introduction to English Morphology Finite State Transducers
Chapter 3. Morphology and Finite-State Transducers From: Chapter 3 of An Introduction to Natural Language Processing, Computational Linguistics, and Speech.
Morphology and Finite-State Transducers. Why this chapter? Hunting for singular or plural of the word ‘woodchunks’ was easy, isn’t it? Lets consider words.
Morphology (CS ) By Mugdha Bapat Under the guidance of Prof. Pushpak Bhattacharyya.
Lecture 3, 7/27/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 4 28 July 2005.
October 2006Advanced Topics in NLP1 CSA3050: NLP Algorithms Finite State Transducers for Morphological Parsing.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture4 1 August 2007.
Morphological Recognition We take each sub-lexicon of each stem class and we expand each arc (e.g. the reg-noun arc) with all the morphemes that make up.
Introduction Morphology is the study of the way words are built from smaller units: morphemes un-believe-able-ly Two broad classes of morphemes: stems.
LING 388: Language and Computers Sandiway Fong Lecture 22: 11/10.
10/8/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.
Session 11 Morphology and Finite State Transducers Introduction to Speech Natural and Language Processing (KOM422 ) Credits: 3(3-0)
Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model.
Lecture 3, 7/27/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 3 27 July 2005.
Finite State Transducers
Finite State Transducers for Morphological Parsing
Words: Surface Variation and Automata CMSC Natural Language Processing April 3, 2003.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 3 27 July 2007.
Morphological Analysis Chapter 3. Morphology Morpheme = "minimal meaning-bearing unit in a language" Morphology handles the formation of words by using.
CS 4705 Lecture 3 Morphology. What is morphology? The study of how words are composed of morphemes (the smallest meaning-bearing units of a language)
CSA3050: Natural Language Algorithms Finite State Devices.
The Simplest NL Applications: Text Searching and Pattern Matching Read J & M Chapter 2.
Natural Language Processing Chapter 2 : Morphology.
October 2007Natural Language Processing1 CSA3050: Natural Language Algorithms Words and Finite State Machinery.
CSA4050: Advanced Topics in NLP Computational Morphology II Introduction 2 Level Morphology.
October 2004CSA3050 NLP Algorithms1 CSA3050: Natural Language Algorithms Morphological Parsing.
Finite State Morphology Alexander Fraser & Luisa Berlanda CIS, Ludwig-Maximilians-Universität München Computational Morphology.
Finite State Morphology Alexander Fraser & Luisa Berlanda CIS, Ludwig-Maximilians-Universität München Computational Morphology.
Two Level Morphology Alexander Fraser & Liane Guillou CIS, Ludwig-Maximilians-Universität München Computational Morphology.
Lecture 7 Summary Survey of English morphology
Speech and Language Processing
Composition is Our Friend
CPSC 503 Computational Linguistics
Morphology: Parsing Words
CPSC 503 Computational Linguistics
Natural Language Processing
CSCI 5832 Natural Language Processing
Speech and Language Processing
CSCI 5832 Natural Language Processing
Token generation - stemming
LING 138/238 SYMBSYS 138 Intro to Computer Speech and Language Processing Dan Jurafsky 11/24/2018 LING 138/238 Autumn 2004.
By Mugdha Bapat Under the guidance of Prof. Pushpak Bhattacharyya
CS4705 Natural Language Processing
CPSC 503 Computational Linguistics
CPSC 503 Computational Linguistics
CPSC 503 Computational Linguistics
Morphological Parsing
CSCI 5832 Natural Language Processing
Presentation transcript:

CIS, Ludwig-Maximilians-Universität München Computational Morphology Two Level Morphology Alexander Fraser fraser@cis.uni-muenchen.de CIS, Ludwig-Maximilians-Universität München Computational Morphology SoSe 2017 2017-06-21

Outline Today we will briefly discuss two-level morphology Then Luisa will present an exercise showing how to use these concepts

Credits Adapted from a lecture by Ching-Long Yeh, Tatung University Which was adapted from: Chapter 3 Morphology and Finite-State Transducers Speech and Language Processing An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition Daniel Jurafsky and James H. Martin

Two-Level Morphology Two-level morphology is a key idea for dealing with morphology in a finite state framework The critical generalization is that it is difficult to deal with things like orthographic rules in English with a single transducer The key to making this work will be to use two transducers Recall that we can compose transducers Composing intuitively means we feed the output of the first transducer as the input to the second transducer

Why two levels? Let's talk about generation: going from lexical to surface levels The intermediate level captures morpheme segmentation ^ means morpheme segmentation. # means end of word Working with an intermediate representation is powerful, it lets us deal with variation in a compact way We will handle orthographic generation of the English plural For instance, dog -> dogs, but fox -> foxes

3.2 Finite-State Morphological Parsing Morphological Parsing with FST Inversion is useful because it makes it easy to convert a FST-as-parser into an FST-as-generator. Composition is useful because it allows us to take two transducers than run in series and replace them with one complex transducer. T1。T2(S) = T2(T1(S) ) Reg-noun Irreg-pl-noun Irreg-sg-noun fox fat fog aardvark g o:e o:e s e sheep m o:i u:εs:c e goose mouse A transducer for English nominal number inflection Tnum Morphology and FSTs

3.2 Finite-State Morphological Parsing Morphological Parsing with FST Composition is useful because it allows us to take two transducers than run in series and replace them with one complex transducer. T1。T2(S) = T2(T1(S) ) Reg-noun Irreg-pl-noun Irreg-sg-noun fox cat fog aardvark g o:e o:e s e sheep m o:i u:εs:c e goose mouse A transducer for English nominal number inflection Tnum

3.2 Finite-State Morphological Parsing Morphological Parsing with FST The transducer Tstems, which maps roots to their root-class Morphology and FSTs

3.2 Finite-State Morphological Parsing Morphological Parsing with FST The transducer Tstems, which maps roots to their root-class Morphology and FSTs

3.2 Finite-State Morphological Parsing Morphological Parsing with FST ^: morpheme boundary #: word boundary A fleshed-out English nominal inflection FST Tlex = Tnum。Tstems Morphology and FSTs

3.2 Finite-State Morphological Parsing Orthographic Rules and FSTs Spelling rules (or orthographic rules) Name Description of Rule Example Consonant doubling E deletion E insertion Y replacement K insertion 1-letter consonant doubled before -ing/-ed Silent e dropped before -ing and -ed e added after -s, -z, -x, -ch, -sh, before -s -y changes to -ie before -s, -i before -ed Verb ending with vowel + -c add -k beg/begging make/making watch/watches try/tries panic/panicked These spelling changes can be thought as taking as input a simple concatenation of morphemes and producing as output a slightly-modified concatenation of morphemes. Morphology and FSTs

3.2 Finite-State Morphological Parsing Orthographic Rules and FSTs “insert an e on the surface tape just when the lexical tape has a morpheme ending in x (or z, etc) and the next morphemes is –s” x ε e/ s ^ s# z “rewrite a as b when it occurs between c and d” a b / c d This syntax is from the seminar paper of Chomsky and Halle (1968) Note that ^ is used as a morpheme boundary, and # means that we talking about a word-final "-s" Morphology and FSTs

3.2 Finite-State Morphological Parsing Orthographic Rules and FSTs The transducer for the E-insertion rule Morphology and FSTs

3.3 Combining FST Lexicon and Rules Morphology and FSTs

3.3 Combining FST Lexicon and Rules Morphology and FSTs

3.3 Combining FST Lexicon and Rules The power of FSTs is that the exact same cascade with the same state sequences is used When the machine is generating the surface form from the lexical tape, or When it is parsing the lexical tape from the surface tape. Parsing can be slightly more complicated than generation, because of the problem of ambiguity For example, foxes could be fox +V +3SG as well as fox +N +PL This shows that thinking about implementing generation might be easier than thinking about implementing parsing We can use our composed transducers in both directions Morphology and FSTs

3.4 Lexicon-Free FSTs: the Porter Stemmer Information retrieval One of the mostly widely used stemming algorithms is the simple and efficient Porter (1980) algorithm, which is based on a series of simple cascaded rewrite rules. ATIONAL  ATE (e.g., relational  relate) ING  εif stem contains vowel (e.g., motoring  motor) Problem: Not perfect: error of commision, omission Experiments have been made Some improvement with smaller documents Any improvement is quite small Morphology and FSTs

Summary Two-level morphology depends on using two composed transducers to capture complex morphological phenomena The example we looked at involved the orthography of realizing the plural morpheme "-s" in English Two-level morphology is the technology behind most morphological analysis systems

Thank you for your attention