Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to English Morphology Finite State Transducers

Similar presentations

Presentation on theme: "Introduction to English Morphology Finite State Transducers"— Presentation transcript:

1 Introduction to English Morphology Finite State Transducers
CSA3050: NL Algorithms Introduction to English Morphology Finite State Transducers CSA3050: NLP Algorithms

2 Acknowledgement For further details see Jurafsky & Martin Ch.3
CSA3050: NLP Algorithms

3 Morphology Morphology is the study of how word-parts combine to form word wholes. Several different dimensions: Orthographic - rules for combining strings of characters together. Syntax - effect on syntactic category. Semantic - effect on meaning. CSA3050: NLP Algorithms

4 Examples of Morphological Processes
Affixation prefix suffix circumfix: German ge + stem + t e.g. sagen, gesagt infix: unbloodylikely Vowel change: swim/swam Consonant change: send/sent CSA3050: NLP Algorithms

5 Inflectional/Derivational Morphology
Inflectional +s plural +ed past category preserving productive: always applies (esp. new words, e.g. fax) systematic: same semantic effect Derivational +ment category changing escape+ment not completely productive: detractment* not completely systematic: catchment CSA3050: NLP Algorithms

6 English Inflectional Morphology
Applies to nouns, verbs and adjectives only Number of inflections relatively small Nouns Plural, Possessive Verbs Verb forms Adjectives Comparison CSA3050: NLP Algorithms

7 Noun Inflections Regular Irregular Singular cat church mouse ox Plural
cats churches mice oxen CSA3050: NLP Algorithms

8 Regular Verb Inflections
stem walk merge try map -s form walks merges tries maps -ing participle walking merging trying mapping -ed participle or past walked merged tried mapped CSA3050: NLP Algorithms

9 Irregular Verb Inflections
stem eat catch cut go -s form eats catches cuts goes -ing participle eating catching cutting going Past ate caught went -ed participle eaten gone CSA3050: NLP Algorithms

10 Morphological Parsing
Output Analysis cat + PL Input Word cats Morphological Parser Output is a string of morphemes Reversibility? CSA3050: NLP Algorithms

11 Morphological Parsing: Examples
Input word Output morphemes cats cat +N +PL cat cat + N + SG cities city + N + PL walks walk + V + 3SG cook cook +N +SG or cook +V CSA3050: NLP Algorithms

12 Morphemes Morpheme is a theoretical contruct ...
but has a practical use Choice of morpheme vocabulary: theoretical and practical motivation Distinction between underlying morpheme and its realisation. String of morphemes could be turned into another representation later CSA3050: NLP Algorithms

13 Morphological Parsing Requires
Lexicon: list of stems and affixes + related information (e.g syntactic category) Morphotactics: a model of ordering constraints over morphemes (e.g. the fact that +s comes after the stem not before). Correspondences between input and output strings Spelling Rules: city + s  cities CSA3050: NLP Algorithms

14 Lexicon Lexicon is generally divided into sublexicons
Stem Lexicon Noun Stems Verb Stems etc Suffix Lexicon Prefix Lexicon Can all be represented as FSAs CSA3050: NLP Algorithms

15 FSA for Sublexicon Fragment
h e s a e i t s CSA3050: NLP Algorithms

16 FSA for Morphotactics for Noun Inflection
CSA3050: NLP Algorithms

17 Morphotactics for Verb Inflection
CSA3050: NLP Algorithms

18 Input/Output Correspondences
Problem: how to specify correspondence between input word, and output analysis. Given: both input and output are strings. Two level morphology (Koskenniemi 1983) proposes Surface Tape (words) Lexical Tape (concatenation of morphemes) CSA3050: NLP Algorithms

19 2 Level Model The automaton used to perform the mapping
Between these levels is the finite state transducer (FST). CSA3050: NLP Algorithms

20 Basic FS Transducer Each transition of a transducer is labelled with a pair of symbols Input symbols are matched against the lower-side symbols on transitions. If analysis succeeds, return the string of upper-side symbols output symb input symb CSA3050: NLP Algorithms

21 Morphological Analysis
+PL T A C S e +N { ("CATS", "CAT+N+PL"), ("CAT", "CAT+N+SG") } CSA3050: NLP Algorithms

22 FST Formal Definition States, initial state, final states: same as FSA
Alphabets I and O are input and output alphabets, not necessarily disjoint. FST Alphabet Σ  I x O Transition function δ(q, i:o), defines the state q' that ensues when the machine is in state q and encounters complex symbol i:o. CSA3050: NLP Algorithms

23 FST Alphabet Example I x O O I Σ c a t ε a:c a:a c:c a:t c:ac:t a:ε
':ε ' t:c t:a t:t t:ε CSA3050: NLP Algorithms

24 Summary Morphological processing can be handled by finite state machinery Finite State Transducers are formally very similar to Finite State Automata. They are formally equivalent to regular relations, i.e. sets of pairings of sentences of regular languages. CSA3050: NLP Algorithms

Download ppt "Introduction to English Morphology Finite State Transducers"

Similar presentations

Ads by Google