Two Level Morphology Alexander Fraser & Liane Guillou CIS, Ludwig-Maximilians-Universität München Computational Morphology.

Slides:



Advertisements
Similar presentations
Finite-state automata and Morphology
Advertisements

Jing-Shin Chang1 Morphology & Finite-State Transducers Morphology: the study of constituents of words Word = {a set of morphemes, combined in language-dependent.
CS Morphological Parsing CS Parsing Taking a surface input and analyzing its components and underlying structure Morphological parsing:
Natural Language Processing Lecture 3—9/3/2013 Jim Martin.
Computational Morphology. Morphology S.Ananiadou2 Outline What is morphology? –Word structure –Types of morphological operation – Levels of affixation.
Morphological Analysis Chapter 3. Morphology Morpheme = "minimal meaning-bearing unit in a language" Morphology handles the formation of words by using.
Finite-State Transducers Shallow Processing Techniques for NLP Ling570 October 10, 2011.
Morphology, Phonology & FSTs Shallow Processing Techniques for NLP Ling570 October 12, 2011.
Spring 2002NLE1 CC 384: Natural Language Engineering Week 2, Lecture 2 - Lemmatization and Stemming; the Porter Stemmer.
BİL711 Natural Language Processing1 Morphology Morphology is the study of the way words are built from smaller meaningful units called morphemes. We can.
6/2/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.
Morphology & FSTs Shallow Processing Techniques for NLP Ling570 October 17, 2011.
6/10/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 3 Giuseppe Carenini.
LIN3022 Natural Language Processing Lecture 3 Albert Gatt 1LIN3022 Natural Language Processing.
CSCI 5832 Natural Language Processing Lecture 5 Jim Martin.
1 Morphological analysis LING 570 Fei Xia Week 4: 10/15/07 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A.
Learning Bit by Bit Class 3 – Stemming and Tokenization.
Morphology See Harald Trost “Morphology”. Chapter 2 of R Mitkov (ed.) The Oxford Handbook of Computational Linguistics, Oxford (2004): OUP D Jurafsky &
Morphological analysis
CS 4705 Morphology: Words and their Parts CS 4705 Julia Hirschberg.
CS 4705 Lecture 3 Morphology: Parsing Words. What is morphology? The study of how words are composed from smaller, meaning-bearing units (morphemes) –Stems:
LING 438/538 Computational Linguistics Sandiway Fong Lecture 14: 10/12.
Finite State Transducers The machine model we will study for morphological parsing is called the finite state transducer (FST) An FST has two tapes –input.
CS 4705 Morphology: Words and their Parts CS 4705 Julia Hirschberg.
Introduction to English Morphology Finite State Transducers
Morphology 1 Introduction Morphology Morphological Analysis (MA)
Chapter 3. Morphology and Finite-State Transducers From: Chapter 3 of An Introduction to Natural Language Processing, Computational Linguistics, and Speech.
Morphology and Finite-State Transducers. Why this chapter? Hunting for singular or plural of the word ‘woodchunks’ was easy, isn’t it? Lets consider words.
Morphology (CS ) By Mugdha Bapat Under the guidance of Prof. Pushpak Bhattacharyya.
Finite-state automata 3 Morphology Day 14 LING Computational Linguistics Harry Howard Tulane University.
Lecture 3, 7/27/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 4 28 July 2005.
October 2006Advanced Topics in NLP1 CSA3050: NLP Algorithms Finite State Transducers for Morphological Parsing.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture4 1 August 2007.
Morphological Recognition We take each sub-lexicon of each stem class and we expand each arc (e.g. the reg-noun arc) with all the morphemes that make up.
Introduction Morphology is the study of the way words are built from smaller units: morphemes un-believe-able-ly Two broad classes of morphemes: stems.
LING 388: Language and Computers Sandiway Fong Lecture 22: 11/10.
10/8/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.
Session 11 Morphology and Finite State Transducers Introduction to Speech Natural and Language Processing (KOM422 ) Credits: 3(3-0)
Ling 570 Day #3 Stemming, Probabilistic Automata, Markov Chains/Model.
Lecture 3, 7/27/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 3 27 July 2005.
Chapter 3: Morphology and Finite State Transducer
Finite State Transducers
Chapter 3: Morphology and Finite State Transducer Heshaam Faili University of Tehran.
Finite State Transducers for Morphological Parsing
Words: Surface Variation and Automata CMSC Natural Language Processing April 3, 2003.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture 3 27 July 2007.
Morphological Analysis Chapter 3. Morphology Morpheme = "minimal meaning-bearing unit in a language" Morphology handles the formation of words by using.
CS 4705 Lecture 3 Morphology. What is morphology? The study of how words are composed of morphemes (the smallest meaning-bearing units of a language)
CSA3050: Natural Language Algorithms Finite State Devices.
The Simplest NL Applications: Text Searching and Pattern Matching Read J & M Chapter 2.
Natural Language Processing Chapter 2 : Morphology.
1/11/2016CPSC503 Winter CPSC 503 Computational Linguistics Lecture 2 Giuseppe Carenini.
CSA4050: Advanced Topics in NLP Computational Morphology II Introduction 2 Level Morphology.
October 2004CSA3050 NLP Algorithms1 CSA3050: Natural Language Algorithms Morphological Parsing.
Finite State Morphology Alexander Fraser & Luisa Berlanda CIS, Ludwig-Maximilians-Universität München Computational Morphology.
Finite State Morphology Alexander Fraser & Luisa Berlanda CIS, Ludwig-Maximilians-Universität München Computational Morphology.
CIS, Ludwig-Maximilians-Universität München Computational Morphology
Lecture 7 Summary Survey of English morphology
Speech and Language Processing
Morphology: Parsing Words
CSCI 5832 Natural Language Processing
Speech and Language Processing
CSCI 5832 Natural Language Processing
INFLECTED ENDINGS.
By Mugdha Bapat Under the guidance of Prof. Pushpak Bhattacharyya
CPSC 503 Computational Linguistics
CPSC 503 Computational Linguistics
Morphological Parsing
CSCI 5832 Natural Language Processing
Presentation transcript:

Two Level Morphology Alexander Fraser & Liane Guillou CIS, Ludwig-Maximilians-Universität München Computational Morphology and Electronic Dictionaries SoSe

Outline Today we will briefly discuss two-level morphology Then Luisa will present an exercise showing how to use these concepts

Credits Adapted from a lecture by Ching-Long Yeh, Tatung University Which was adapted from: Chapter 3 Morphology and Finite-State Transducers Speech and Language Processing An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition Daniel Jurafsky and James H. Martin

Two-Level Morphology Two-level morphology is a key idea for dealing with morphology in a finite state framework The critical generalization is that it is difficult to deal with things like orthographic rules in English with a single transducer The key to making this work will be to use two transducers Recall that we can compose transducers – Composing intuitively means we feed the output of the first transducer as the input to the second transducer

Morphology and FSTs5 3.2 Finite-State Morphological Parsing Morphological Parsing with FST Inversion is useful because it makes it easy to convert a FST-as-parser into an FST- as-generator. Composition is useful because it allows us to take two transducers than run in series and replace them with one complex transducer. –T 1 。 T 2 (S) = T 2 (T 1 (S) ) Reg-nounIrreg-pl-nounIrreg-sg-noun fox fat fog aardvark g o:e o:e s e sheep m o:i u:εs:c e goose sheep mouse A transducer for English nominal number inflection T num

3.2 Finite-State Morphological Parsing Morphological Parsing with FST Composition is useful because it allows us to take two transducers than run in series and replace them with one complex transducer. –T 1 。 T 2 (S) = T 2 (T 1 (S) ) Reg-nounIrreg-pl-nounIrreg-sg-noun fox cat fog aardvark g o:e o:e s e sheep m o:i u:εs:c e goose sheep mouse A transducer for English nominal number inflection T num

Morphology and FSTs7 3.2 Finite-State Morphological Parsing Morphological Parsing with FST The transducer T stems, which maps roots to their root-class

Morphology and FSTs8 3.2 Finite-State Morphological Parsing Morphological Parsing with FST The transducer T stems, which maps roots to their root-class

Morphology and FSTs9 3.2 Finite-State Morphological Parsing Morphological Parsing with FST A fleshed-out English nominal inflection FST T lex = T num 。 T stems ^: morpheme boundary #: word boundary

Morphology and FSTs Finite-State Morphological Parsing Orthographic Rules and FSTs Spelling rules (or orthographic rules) NameDescription of RuleExample Consonant doubling E deletion E insertion Y replacement K insertion 1-letter consonant doubled before -ing/-ed Silent e dropped before -ing and -ed e added after -s, -z, -x, -ch, -sh, before -s -y changes to -ie before -s, -i before -ed Verb ending with vowel + -c add -k beg/begging make/making watch/watches try/tries panic/panicked –These spelling changes can be thought as taking as input a simple concatenation of morphemes and producing as output a slightly-modified concatenation of morphemes.

Morphology and FSTs Finite-State Morphological Parsing Orthographic Rules and FSTs “ insert an e on the surface tape just when the lexical tape has a morpheme ending in x (or z, etc) and the next morphemes is –s ” x ε  e/ s ^ s# z a  b / c d “ rewrite a as b when it occurs between c and d ” This syntax is from the seminar paper of Chomsky and Halle (1968) Note that ^ is used as a morpheme boundary, and # means that we talking about a word-final "-s"

Morphology and FSTs Finite-State Morphological Parsing Orthographic Rules and FSTs The transducer for the E-insertion rule

Morphology and FSTs Combining FST Lexicon and Rules

Morphology and FSTs Combining FST Lexicon and Rules

Morphology and FSTs Combining FST Lexicon and Rules The power of FSTs is that the exact same cascade with the same state sequences is used –when machine is generating the surface form from the lexical tape, or –When it is parsing the lexical tape from the surface tape. Parsing can be slightly more complicated than generation, because of the problem of ambiguity. –For example, foxes could be fox +V +3SG as well as fox +N +PL

Morphology and FSTs Lexicon-Free FSTs: the Porter Stemmer Information retrieval One of the mostly widely used stemming algorithms is the simple and efficient Porter (1980) algorithm, which is based on a series of simple cascaded rewrite rules. –ATIONAL  ATE (e.g., relational  relate) –ING  εif stem contains vowel (e.g., motoring  motor) Problem: –Not perfect: error of commision, omission Experiments have been made –Some improvement with smaller documents –Any improvement is quite small

Summary Two-level morphology depends on using two composed transducers to capture complex morphological phenomena The example we looked at involved the orthography of realizing the plural morpheme "-s" in English Two-level morphology is the technology behind most morphological analysis systems

Thank you for your attention