Presentation is loading. Please wait.

Presentation is loading. Please wait.

Human Language Technology Finite State Transducers.

Similar presentations


Presentation on theme: "Human Language Technology Finite State Transducers."— Presentation transcript:

1 Human Language Technology Finite State Transducers

2 October 2009HLT: finite state transducers2 Acknowledgement Material in this lecture derived/copied in part from –Richard Sproat CL46 Lectures –Lauri Karttunen LSA lectures 2005 –Shuly Wintner 2008 Malta

3 October 2009HLT: finite state transducers3 Three Key Concepts Finite State Transducers Regular Relations Computational Morphology

4 October 2009HLT: finite state transducers4 Three Key Concepts Finite State Transducers Regular Relations Computational Morphology

5 October 2009HLT: finite state transducers5 A Regular Set ab abab ababab abababab ababababab. L1

6 October 2009HLT: finite state transducers6 Two Regular Sets ab abab ababab abababab ababababab. ba baba bababa babababa bababababa. L1 L2

7 October 2009HLT: finite state transducers7 A Regular Relation  L1 x L2 ab abab ababab abababab ababababab. ba baba bababa babababa bababababa. L1 L2 or {("ab","ba"), ("abab","baba"),...}

8 October 2009HLT: finite state transducers8 Some closure properties for regular relations Concatenation [R1 R2] Power (R n ) Reversal Inversion (R -1 ) Composition: R 1 ○ R 2

9 October 2009HLT: finite state transducers9 Concatenation and Power Concatenation R1 = {("a","b")} R2 = {("c","d")} [R1 R2] = {("ac","bd")} Power R1+ = {("a","b"),("aa","bb"), ("aaa","bbb"),...}

10 October 2009HLT: finite state transducers10 61 Composition R 1 ○ R 2 denotes the composition of relations R 1 and R 2. Definition If R 1 contains And R 2 contains Then R 1 ○ R 2 contains R 1 and R 2 and B must be relations. If either is just a language, it is assumed to abbreviate the identity relation. R 1 ○ R 2 is written [R 1.o. R 2 ] in xfst

11 October 2009HLT: finite state transducers11 Closure Properties of Regular Languages and Relations Operation Regular Languages Regular Relations Union yes yes Concatenation yes yes Iteration yes yes Intersection yes no Subtraction yes no Complementation yes no Composition n/a yes

12 October 2009HLT: finite state transducers12 Morphology as a Regular Relation cat cats mice lives. cat cat+N+PL mouse+N+PL life+N+PL live+V+3SING. surface language lexical language or {("cat,cat"),("cats","cat+N+PL"),......}

13 October 2009HLT: finite state transducers13 Part-of-Speech Tagging I know some new tricks PRON V DET ADJ N said the Cat in the Hat V DET N P DET N

14 October 2009HLT: finite state transducers14 Singular-to-plural mapping: cat hat ox child mouse sheep cats hats oxen children mice sheep

15 October 2009HLT: finite state transducers15 Three Key Concepts Finite State Transducers Regular Relations Computational Morphology

16 October 2009HLT: finite state transducers16 FSA a Used for Recognition Generation

17 October 2009HLT: finite state transducers17 Finite State Transducers A finite state transducer (FST) is essentially an FSA finite state automaton that works on two (or more) tapes. The most common way to think about transducers is as a kind of translating machine which works by reading from one tape and writing onto the other.

18 October 2009HLT: finite state transducers18 FST Definition A 2 way FST is a quintuple (K,s,F,  i x  o,  ) where  i,  o are input and output alphabets K is a finite set of states s  K is an initial state F  K are final states  is a transition relation of type K x  i x  o x K

19 October 2009HLT: finite state transducers19 FST a Used for Recognition Generation Translation b upper tape lower tape

20 October 2009HLT: finite state transducers20 A Very Simple Transducer a b Relation { ("a","b") } Notation a:b encodes the transition

21 October 2009HLT: finite state transducers21 A Very Simple Transducer a b a:ba:b also written as

22 October 2009HLT: finite state transducers22 A Very Simple Transducer a b a:ba:b upper side lower side

23 October 2009HLT: finite state transducers23 Symbol Pairs Symbols vs. symbol pairs –In general, no distinction is made in xfst between a the language {“a”} a:a the identity relation {(“a”, “a”)} a

24 October 2009HLT: finite state transducers24 A (more interesting) Transducer Relation { ("a","b"), ("aa","bb"),...} Notation a:b* N.B. with this notation a and b must be single symbols 1 a:ba:b

25 October 2009HLT: finite state transducers25 Transducer have Several Modes of Operation generation mode: It writes on both tapes. A string of as on one tape and a string of bs on the other tape. Both strings have the same length. recognition mode: It accepts when the word on the first tape consists of exactly as many as as the word on the second tape consists of bs. translation mode (left to right): It reads as from the first tape and writes a b for every a that it reads onto the second tape. translation mode (right to left): It reads bs from the second tape and writes an a for every b that it reads onto the first tape.

26 October 2009HLT: finite state transducers26 The Basic Idea Morphology is regular Morphology is finite state

27 October 2009HLT: finite state transducers27 Morphology is Regular The relation between the surface forms of a language and the corresponding lexical forms can be described as a regular relation, e.g. { ("leaf+N+Pl","leaves"),("hang+V+Past","hung"),...} Regular relations are closed under operations such as concatenation, iteration, union, and composition. Complex regular relations can be derived from simpler relations.

28 October 2009HLT: finite state transducers28 Morphology is finite-state A regular relation can be defined using the metalanguage of regular expressions. [{talk} | {walk} | {work}] [%+Base:0 | %+SgGen3:s | %+Progr:{ing} | %+Past:{ed}]; A regular expression can be compiled into a finite-state transducer that implements the relation computationally.

29 October 2009HLT: finite state transducers29 Compilation [{talk} | {walk} | {work}] [%+Base:0 | %+SgGen3:s | %+Progr:{ing} | %+Past:{ed}]; Regular expression k t a a w o l r +Progr:i  :g +3rdSg:s +Past:e  :d  :n +Base:  Finite-state transducer final state initial state

30 October 2009HLT: finite state transducers30 work+3rdSg --> works k:kk:k t:t a:a w:ww:w o:oo:o l:l r:rr:r +Progr:i  :g +3rdSg:s +Past:e  :d  :n +Base:  Generation

31 October 2009HLT: finite state transducers31 talked --> talk+Past k:kk:k t:tt:t a:a a:aa:a w:w o:o l:ll:l r:r +Progr:i  :g +3rdSg:s +Past:e :d:d  :n +Base:  Analysis

32 October 2009HLT: finite state transducers32 XFST Demo 2 xfst[0]: regex [{talk} | {walk} | {work}] [% +Base:0 | %+SgGen3:s | %+Progr:{ing} | %+Past:{ed}]; % xfst xfst[0]: start xfst compile a regular expression apply the result xfst[1]: apply up walked walk+Past xfst[1]: apply down talk+SgGen3 talks

33 October 2009HLT: finite state transducers33 Lexical transducer veut vouloir +IndP +SG + P3 Finite-state transducer inflected form citation forminflection codes vouloir+IndP+SG+P3 veut Bidirectional: generation or analysis Compact and fast Comprehensive systems have been built for over 40 languages: –English, German, Dutch, French, Italian, Spanish, Portuguese, Finnish, Russian, Turkish, Japanese, Korean, Basque, Greek, Arabic, Hebrew, Bulgarian, …

34 October 2009HLT: finite state transducers34 How lexical transducers are made Lexicon FST Rule FSTs Compiler fat +Adj r +Comp fat te Lexical Transducer (a single FST) composition Lexicon Regular Expression Rules Regular Expressions Morphotactics Alternations

35 October 2009HLT: finite state transducers35 Sequential Model... Surface form Intermediate form Lexical form fst 1fst 2 fst n Ordered sequence of rewrite rules (Chomsky & Halle ‘68) can be modeled by a cascade of finite-state transducers Johnson ‘72 Kaplan & Kay ‘81

36 October 2009HLT: finite state transducers36 Parallel Model Set of parallel of two-level rules (constraints) compiled into finite-state automata interpreted as transducers Koskenniemi ‘83 fst 1 fst 2 fst n... Surface form Lexical form

37 October 2009HLT: finite state transducers37 Sequential vs. Parallel rules compose intersect FST rule 1 rule 2 rule n... Surface form Lexical form Koskenniemi 1983 Intermediate form... Surface form Lexical form rule 1 rule n rule 1 Chomsky&Halle 1968

38 October 2009HLT: finite state transducers38 Sequential vs. Parallel Rules Sequential rules are combined by means of composition. Advantage: FSTs are closed under composition Disadvantage: order of operations is sensitive Parallel rules are combined by means of intersection In general, FSTs are not closed under intersection. … but FSTs without ε-transitions are closed under intersection.

39 October 2009HLT: finite state transducers39 Crossproduct A.x. BThe relation that maps every string in A to every string in B, and vice versa A:BSame as [A.x. B]. b:y c:0a:x a b c.x. x y[a b c] : [x y]{abc}:{xy}

40 October 2009HLT: finite state transducers40 Composition A.o. BThe relation C such that if A maps x to y and B maps y to z, C maps x to z. b:B c:Ca:A b ca b:B c:C d:D {abc}.o. [a:A | b:B | c:C | d:D]*

41 October 2009HLT: finite state transducers41 Transducers are not closed under intersection ε:b c:b ε:a ε:bc:a T 1 (C n ) = { a n b m | m≥0 } T 2 (C n ) = { a m b n | m≥0 } T 1 ∩T 2 (C n ) = { a n b n }

42 October 2009HLT: finite state transducers42 Xerox RE Operators $ containment => restriction -> replacement –Make it easier to describe complex languages and relations without extending the formal power of finite- state systems.

43 October 2009HLT: finite state transducers43 Containmenta? ? a $a [?* a ?*]

44 October 2009HLT: finite state transducers44 Restriction ? c b b c ? a c a => b _ c “Any a must be preceded by b and followed by c.” ~[~[?* b] a ?*] & ~[?* a ~[c ?*]] Equivalent expression

45 October 2009HLT: finite state transducers45 Replacement a:b b a ? ? b:a a a:b a b -> b a “Replace ‘ab’ by ‘ba’.” [[~$[a b] [[a b].x. [b a]]]* ~$[a b]] Equivalent expression

46 October 2009HLT: finite state transducers46 Replacement + Marking 0:[ [ 0:] ? a e i o u ] a|e|i|o|u -> %[... %] p o t a t o p[o]t[a]t[o]

47 October 2009HLT: finite state transducers47 Conditional Replacement The relation that replaces A by B between L and R leaving everything else unchanged. A -> B Replacement L _ R Context

48 October 2009HLT: finite state transducers48 Sequential application N -> m / _ p p -> m / m _ k a N p a n k a m p a n k a m m a n

49 October 2009HLT: finite state transducers49 Sequential application in detail N:m N ? ? 0 2 1 p m p N m p:m ? ? 0 1 m p m k a N p a n k a m p a n k a m m a n 0 0 0 2 0 0 0 0 0 0 1 0 0 0

50 October 2009HLT: finite state transducers50 Composition N:m N ? ? 0 3 1 m p N ? m 2 p:m N m N:m k a N p a n k a m m a n 0 0 0 3 0 0 0


Download ppt "Human Language Technology Finite State Transducers."

Similar presentations


Ads by Google