Presentation is loading. Please wait.

Presentation is loading. Please wait.

Finite State Transducers for Morphological Parsing

Similar presentations


Presentation on theme: "Finite State Transducers for Morphological Parsing"— Presentation transcript:

1 Finite State Transducers for Morphological Parsing
CSA3050: NLP Algorithms Finite State Transducers for Morphological Parsing CSA3050: NLP Algorithms

2 Resumé FSAs are equivalent to regular languages
FSTs are equivalent to regular relations (over pairs of regular languages) FSTs are like FSAs but with complex labels. We can use FSTs to transduce between surface and lexical levels. CSA3050: NLP Algorithms

3 Dotted Pair Notation 1) FSA recogniser for "fox" f o x
2) FST transducers for fox/fox; goose/geese f:f o:o x:x g:g o:e s:s e:e CSA3050: NLP Algorithms

4 Dotted Pair Notation (2)
By convention, x:y pairs lexical symbol x with surface symbol y By convention, within the context of FSTs, we often encounter "default pairs" of the form x:x. These are often written as "x". g o:e s e CSA3050: NLP Algorithms

5 FSA for Number Inflection
How can we augment this to produce an analysis? CSA3050: NLP Algorithms

6 3 Steps Create a transducer Tnum for noun number inflection. This will add number and category information given word classes as input. Create a transducer Tstems mapping words to word classes. Hook the two together. CSA3050: NLP Algorithms

7 Tnum example  ^ “lexical” +N +PL reg-noun-stem s # reg-noun-stem
“intermediate” CSA3050: NLP Algorithms

8 1. Tnum: Noun Number Inflection
multi-character symbols morpheme boundary ^ word boundary # CSA3050: NLP Algorithms

9 Tstems example # “intermediate” reg-noun-stem d:d o:o g:g f:f o:o x:x
“surface” CSA3050: NLP Algorithms

10 Tstems example # “intermediate” m o:i u:ε s e s h e e p # “surface”
irreg-pl-noun-form Tstems m o:i u:ε s e s h e e p # “surface” CSA3050: NLP Algorithms

11 2. Tstems Lexicon CSA3050: NLP Algorithms

12 Hooking Together There are two ways to hook the two transducers together Cascading: hooking the output of one transducer with the input of the other and running them in series. Composition: composing the two transducers together mathematically to create a third, equivalent transducer. CSA3050: NLP Algorithms

13 Hooking Together: cascading
+PL reg-noun-stem +N lexical Tnum s reg-noun-stem ^ # intermediate Tstems dog fox s # surface CSA3050: NLP Algorithms

14 Composition of Relations
Let R and S be binary relations. The composition of R and S written R S is defined as: (a,c)  R S if and only if (a,b)  R and (b,c)  S for all a,b,c Transducers can also be composed CSA3050: NLP Algorithms

15 Tnum o Tstem CSA3050: NLP Algorithms

16 English Spelling Rules
consonant doubling: beg / begging y replacement: try/tries k insertion: panic/panicked e deletion: make/making e insertion: watch/watches Each rule can be stated in more detail ... CSA3050: NLP Algorithms

17 e Insertion Rule Insert an e on the surface tape just when the lexical tape has morpheme ending in x,s,z,or ch and the next and final morpheme is -s Stated formally   e [x|s|z|ch]^ __ s# CSA3050: NLP Algorithms

18 e insertion over 3 levels
The rule corresponds to the mapping between surface and intermediate levels CSA3050: NLP Algorithms

19 e insertion as an FST CSA3050: NLP Algorithms

20 Incorporating Spelling Rules
Spelling rules, each corresponding to an FST, can be run in parallel provided that they are "aligned". The set of spelling rules is positioned between the surface level and the intermediate level. Parallel execution of FSTs can be carried out: by simulation: in this case FSTs must first be aligned. by first constructing a a single FST corresponding to their intersection. CSA3050: NLP Algorithms

21 Putting it all together
execution of FSTi takes place in parallel CSA3050: NLP Algorithms

22 Kaplan and Kay The Xerox View
FSTi are aligned but separate FSTi intersected together CSA3050: NLP Algorithms

23 Operations over FSTs We can perform operations over FSTs which yield other FSTs. Inversion Union Composition The inversion of T, or T-1 simply computes the inverse mapping to T. CSA3050: NLP Algorithms

24 Inversion T-1 T c a t ^ PL c a t ^ PL lexical lexical surface surface
CSA3050: NLP Algorithms

25 Inversion To invert a transducer Practical consequences:
we switch the order of the complex symbols, i.e. every i:o becomes o:i or we leave the transducer alone, and slightly change the parsing algorithm. Practical consequences: Transducer is reversible We can use the exactly the same transducer to perform either analysis or generation. CSA3050: NLP Algorithms

26 Closure Properties of FSTs
Relations computed by FSTs are closed under inversion union composition not closed (in general) under intersection. However intersection is possible provided that we restrict the class of transducers. complementation subtraction CSA3050: NLP Algorithms


Download ppt "Finite State Transducers for Morphological Parsing"

Similar presentations


Ads by Google