Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fi(n)nish OT Prosody Lauri Karttunen FSMNLP-2005 September 1, 2005.

Similar presentations


Presentation on theme: "Fi(n)nish OT Prosody Lauri Karttunen FSMNLP-2005 September 1, 2005."— Presentation transcript:

1 Fi(n)nish OT Prosody Lauri Karttunen FSMNLP-2005 September 1, 2005

2 Overview Success of Finite-State Morphology Lexical transducers Two ways of describing morphological alterations Sequential (Chomsky & Halle 1968) Parallel (Koskenniemi 1983) Optimality Theory (Prince & Smolensky 1993) Yet another two-level model Finite-state? Lenient composition Finnish OT Prosody Basic Facts Finite-state implementation of Kiparsky’s 2003 analysis with the FST tool (Beesley & Karttunen 2003) Conclusion Final thoughts

3 Computational morphology Analysis leaves leaf N Plleave N Plleave V Sg3 Generation hang V Past hangedhung

4 Two challenges Morphotactics Words are composed of smaller elements that must be combined in a certain order: piti-less-ness is English piti-ness-less is not English Phonological alternations The shape of an element may vary depending on the context pity is realized as piti in pitilessness die becomes dy in dying

5 Morphology is regular (=rational) Morphology is regular (=rational) The relation between the surface forms of a language and the corresponding lexical forms can be described as a regular relation. A regular relation consists of ordered pairs of strings. leaf+N+Pl : leaveshang+V+Past : hung Any finite collection of such pairs is a regular relation. Regular relations are closed under operations such as concatenation, iteration, union, and composition. Complex regular relations can be derived from simple relations.

6 Morphology is finite-state A regular relation can be defined using the metalanguage of regular expressions. [{talk}|{walk}|{work}] [% +Base:0 | %+SgGen3:s| %+Progr:{ing}| %+Past:{ed}]; A regular expression can be compiled into a finite-state transducer that implements the relation computationally.

7 work+3rdSg --> works k:kk:k t:t a:a w:ww:w o:oo:o l:l r:rr:r +Progr:i  :g +3rdSg:s +Past:e  :d  :n +Base:  Generation

8 talked --> talk+Past k:kk:k t:tt:t a:a a:aa:a w:w o:o l:ll:l r:r +Progr:i  :g +3rdSg:s +Past:e :d:d  :n +Base:  Analysis

9 Lexical transducer veut vouloir +IndP +SG + P3 Finite-state transducer inflected form citation forminflection codes vouloir+IndP+SG+P3 veut Bidirectional: generation or analysis Compact and fast Comprehensive systems have been built for over 40 languages: English, German, Dutch, French, Italian, Spanish, Portuguese, Finnish, Russian, Turkish, Japanese, Korean, Basque, Greek, Arabic, Hebrew, Bulgarian, …

10 How lexical transducers are made Lexicon FST Rule FSTs Compiler fat +Adj r +Comp fat te Lexical Transducer (a single FST) composition Lexicon Regular Expression Rules Regular Expressions Morphotactics Alternations

11 Two-level rules vs. rewrite rules compose intersect FST rule 1 rule 2 rule n... Surface form Lexical form Koskenniemi 1983 Intermediate form... Surface form Lexical form rule 1 rule n rule 1 Chomsky&Halle 1968

12 Rewrite rules Epenthesis Harmony Lowering ? u: t y ? A s ? u: t I y ? A s ? u: t u y ? a s ? o: t u y ? a s Yawelmani Vowel Harmony Kisseberth 1969

13 Two-level constraints ? u: t y ? A s ? o: t u y ? a s Underlying representation controls all three alternations. Epenthesis: Insert u or i (underspecification) Harmony: Rounding next to a round V of the same height. Lowering: Long u always realized as long o.

14 Rewrite Rules vs. Constraints Two different ways of decomposing the complex relation between lexical and surface forms into a set of simpler relations that can be more easily understood and manipulated. One approach may be more convenient than the other for particular applications.

15 Overview Success of Finite-State Morphology Lexical transducers Two ways of describing morphological alterations Sequential (Chomsky & Halle 1968) Parallel (Koskenniemi 1983) Optimality Theory (Prince & Smolensky 1993) Yet another two-level model Finite-state? Lenient composition Finnish OT Prosody Basic Facts Finite-state implementation of Kiparsky’s 2003 analysis with the FST tool (Beesley & Karttunen 2003) Conclusion Final thoughts

16 Optimality theory Prince & Smolensky 1993 eliminate rules derivations introduce violable ranked constraints Instant success!

17 Brief Introduction to OT Input A language of underlying lexical forms. GEN A function that generates alternate surface realizations for each input form, possibly an infinite set. Constraints A finite set of principles, preferrably universal, that filter out unwanted realizations. Ranking A language-specific ordering of the constraints.

18 Computational perspective Ellison 1994 OT deals with regular sets and relations: a finite-state system constraint transducers mark violations, marks sorted and counted Tesar 1995 dynamic algorithm for optimal path computations Eisner 1996 two-level typology of optimality constraints: restrict, prohibit “FootForm Decomposed” MIT Working Papers in Linguistics, 31 :115-143 proposes Primitive Optimality Theory (no generalized alignment) Karttunen 1998 Introduces lenient composition Frank & Satta 1998 Prove that OT is regular if # of violations is bounded.

19 Lenient Composition.O. Let R be a relation that maps each input string to one or more outputs. Let C be a constraint that eliminates some outputs. R.O. C is the relation that maps each input string that can meet the constraint C to the outputs that meet C and leaves the rest of the relation R unchanged. (Karttunen 1998) Is constraint ranking rule ordering in disguise?

20 Two-level model vs. OT In some respects, the two-level model of Koskenniemi (1983) was ten years ahead of its time: Symbol-to-symbol constraints, not string relations like rewrite rules. Rules can refer to both input and output contexts. Constraints on the output can be expressed directly. Concepts such as FAITHFULNESS can be expressed straight- forwardly. But two-level constraints were not violable and not ranked. All the constraints have to be satisfied to get any output.

21 Comparisons ApplicationMerging rewrite rules two-level constraints optimality constraints composition intersecting composition intersection lenient composition Karttunen 1998

22 Overview Success of Finite-State Morphology Lexical transducers Two ways of describing morphological alterations Sequential (Chomsky & Halle 1968) Parallel (Koskenniemi 1983) Optimality Theory (Prince & Smolensky 1993) Yet another two-level model Finite-state? Lenient composition Finnish OT Prosody Basic Facts Finite-state implementation of Kiparsky’s 2003 analysis with the FST tool (Beesley & Karttunen 2003) Conclusion Final thoughts

23 Finnish Prosody: basic facts The nucleus of a Finnish syllable must consist of a short vowel, a long vowel, or a diphthong. Main stress is always on the first syllable, secondary stress occurs on non-initial syllables. Adjacent syllables are never stressed. Stressed syllable is initial in the foot. ilmoittautuminen‘registering’ (Nom Sg) ( íl.moit).( tàu.tu).( mì.nen)

24 Ternary feet in Finnish Stress that would fall on a light syllable shifts on the following heavy syllable creating a ternary foot. (ká.las). te.(lèm.me)‘we are fishing’ (íl.moit).(tàu.tu). mi.(sès.ta)‘registering’ (Ela Sg) (rá.kas). ta.(jàt.ta). ri.(àn.sa)‘his mistresses’ (Par Pl) Can we get these facts to come out “for free”, from the interaction of independently motivated principles? Yes! Paul Kiparsky “Finnish Noun Inflection” Generative Approaches to Finnic and Saami Linguistics, Diane Nelson and Satu Manninen (eds.), pp.109-161, CSLI Publications, 2003. Nine Elenbaas and René Kager. "Ternary rhythm and the lapse constraint". Phonology 16. 273-329.

25 Non-OT and OT solutions It is possible to define a cascade of replace rules that produce the desired result. http://www.stanford.edu/~laurik/fsmbook/examples/Fin nishProsody.html But, following Kiparsky, we are going to do OT today, and in a more elegant way than is shown at http://www.stanford.edu/~laurik/fsmbook/examples/Fin nishOTProsody.html

26 General Strategy Input language GEN.o. Compose the input language with GEN to produce a mapping from each input form to all of its output candidates Eliminate suboptimal candidates by applying constraints in the ranked order. At least one output candidate always survives. Constraint 1 Constraint 2 By what finite-state operation?

27 Need a prolific GEN ka.la ka.lá ka.là ka.(là) ka.(lá) ká.la ká.lá ká.là ká.(là) ká.(lá) kà.la (kà.la) (ká).la (ká).lá (ká).là (ká).(là) (ká).(lá) (ká.là) (ká.lá) (ká.la) ☜ (ka.là) (ka.lá) kà.lá kà.là kà.(là) kà.(lá) (kà).la (kà).lá (kà).là (kà).(là) (kà).(lá) (kà.là) (kà.lá) kala ‘fish’ (Nom Sg) 33 candidates

28 Basic definitions 1 Using Parc/XRCE regular expression syntax: define C [b | c | d | f | g | h | j | k | l | m | n | p | q | r | s | t | v | w | x | z]; # Consonant define HighV [u | y | i]; # High vowel define MidV [e | o | ö]; # Mid vowel define LowV [a | ä] ; # Low vowel define USV [HighV | MidV | LowV]; # Unstressed Vowel define MSV [á | é | í | ó | ú | ý | ä’ | ö’]; define SSV [à | è | ì | ò | ù | y` | ä` | ö`]; define SV [MSV | SSV]; # Stressed vowel define V [USV | SV] ; # Vowel

29 Basic definitions 2 define P [V | C]; # Phone define B [[\P+] |.#.]; # Boundary define E.#. | "."; # Edge define Light [C* V]; # Light syllable define Heavy [Light [[V* C*] & $?]; # Heavy syllable define S [Heavy | Light]; # Syllable define SS [S & $SV]; # Stressed syllable define US [S & ~$SV]; # Unstressed syllable define MSS [S & $MSV] ; # Syllable with main stress

30 GEN 1 define MarkNonDiphthong [ [..] -> "." || [HighV|MidV] _ LowV, LowV _ MidV, i _ [MidV - e], u _ [MidV - o], y _ [MidV - ö] ]; Insert a syllable boundary between vowels that cannot form a diphtong: i.a, e.a, a.e, i.o, u.e, y.e, etc. define Syllabify C* V+ C* @->... "." || _ C V ; Insert a syllable boundary after a maximal C* V+ C* pattern that is followed by C V. For example, strukturalismi -> struk.tu.ra.lis.mi.

31 GEN 2 define Stress a (->) á|à, e (->) é|è, i (->) í|ì, o (->) ó|ò, u (->) ú|ù, y (->) "y´"|"y`", ä (->) "ä´"|"ä`", ö (->) "ö´"|"ö`"; Optionally stress any vowel with a primary or secondary stress. define Scan [[S ("." S ("." S)) & $SS] (->) "("... ")" || E _ E] ; Optionally group syllables into unary, binary, or ternary feet when there is at least one stressed syllable. define Gen [MarkNonDiphthongs.o. Syllabify.o. Stress.o. Scan];

32 Demo! regex {kala}.o. Gen (compose) print lower-words (show output candidates) print size (count them)

33 Kiparsky's nine constraints Clash AlignLeft MainStress FootBin Lapse NonFinal StressToWeight Parse AllFeetFirst

34 Counting constraint violations We use asterisks to mark constraint violations. We need a way to prefer candidates with the least number of violation marks. define Viol ${*}; define Viol0 ~Viol;# No violations define Viol1 ~[Viol^2];# At most one violation define Viol2 ~[Viol^3];# At most two violations define Viol3 ~[Viol^4]; This eliminates the violation marks after the candidate set has been pruned by a constraint. define Pardon {*} -> 0;

35 Defining OT Constraints Three types: Unviolable constraints Primary stress in Finnish Ordinary violable constraints Lapse Gradient alignment constraints All-Feet-First Strategy: We define an evaluation template for each of the three types and then define the individual constraints with the help of the templates.

36 Evaluation Template for Unviolable Constraints define Unviolable(Candidates, Constraint) [ Candidates.o. Constraint ]; Example: define MainStress(X) Unviolable(X, B MSS ~$MSS); # B is the left edge of the word or "(". # MSS is a syllable with a primary stress.

37 Evaluation Template for Ordinary Constraints define Eval(Candidates, Violation, Left, Right) [ Candidates.o. Violation ->... {*} || Left _ Right.O. Viol3.O. Viol2.O. Viol1.O. Viol0.o. Pardon ]; where Viol0 is ~${*}, Viol2 is ~[[${*}]^2], etc. and Pardon is {*} -> 0 deleting all violation marks..o. is ordinary composition.O. is lenient composition

38 Evaluation Template for Left- Oriented Gradient Alignment define EvalGradientLeft(Candidates, Violation, Left, Right) [ Candidates.o. Violation -> {*}... ||.#. Left _ Right.o. Violation -> {*}^2... ||.#. Left^2 _ Right.o. Violation -> {*}^3... ||.#. Left^3 _ Right.o. Violation -> {*}^4... ||.#. Left^4 _ Right.o. Violation -> {*}^5... ||.#. Left^5 _ Right.o. Violation -> {*}^6... ||.#. Left^6 _ Right.o. Violation -> {*}^7... ||.#. Left^7 _ Right.o. Violation -> {*}^8... ||.#. Left^8 _ Right.O. Viol12.O. Viol11.O. Viol10.O. Viol9.O. Viol8.O. Viol7.O. Viol6.O. Viol5.O. Viol4.O. Viol3.O. Viol2.O. Viol1.O. Viol0.o. Pardon ];

39 Clash, AlignLeft, MainStress Clash No stress on adjacent syllables. define Clash(X) Eval(X, SS, SS B, ?*); Align-Left The stressed syllable is initial in the foot. define AlignLeft(X) Eval(X, SV,.#. ~[?* "(" C*], ?*); Main Stress The primary stress in Finnish is on the first syllable. define MainStress(X) Unviolable(X, B MSS ~$MSS);

40 FootBin, Lapse, NonFinal Foot-Bin Feet are minimally bimoraic and maximally bisyllabic. define FootBin(X) Eval(X, "(" Light ")" | "(" S["." S]^>1, ?*,?*); Lapse Every unstressed syllable must be adjacent to a stressed syllable or to the word edge. define Lapse(X) Eval(X, US, [B US B], [B US B]); Non-Final The final syllable is not stressed. define NonFinal(X) Eval(X, SS, ?*, ~$".".#.);

41 StressToWeight, Parse, AllFeetFirst Stress-To-Weight Stressed syllables are heavy. define StressToWeight(X) Eval(X, SS & Light, ?*, ")"| E); License-  Syllables are parsed into feet. define Parse(X) Eval(X, S, E, E); All-Ft-Left The left edge of every foot coincides with the left edge of some prosodic word. define AllFeetFirst(X) [ EvalGradientLeft(X, "(", ~$"." "." ~$".", ?*) ];

42 Finnish Prosody Kiparsky 2003: define FinnishProsody(Input) [ AllFeetFirst( Parse( StressToWeight( NonFinal( Lapse( FootBin( MainStress( AlignLeft( Clash( Input.o. Gen)))))))))];

43 FinnWords regex FinnishProsody( {kalastelet} | {kalasteleminen} | {ilmoittautuminen} | {järjestelmättömyydestänsä} | {kalastelemme} | {ilmoittautumisesta} | {järjestelmällisyydelläni} | {järjestelmällistämätöntä} | {voimisteluttelemasta} | {opiskelija} | {opettamassa} | {kalastelet} | {strukturalismi} | {onnittelemanikin} | {mäki} | {perijä} | {repeämä} | {ergonomia} | {puhelimellani} | {matematiikka} | {puhelimistani} | {rakastajattariansa} | {kuningas} | {kainostelijat} | {ravintolat} | {merkonomin} ) ; Demo!

44 Result (ér.go).(nò.mi).a (íl.moit).(tàu.tu).mi.(sès.ta) (íl.moit).(tàu.tu).(mì.nen) (ón.nit).(tè.le).(mà.ni).kin (ó.pis).(kè.li).ja (ó.pet).ta.(màs.sa) (vói.mis).te.(lùt.te).le.(màs.ta) (strúk.tu).ra.(lìs.mi) (rá.vin).(tò.lat) (rá.kas).ta.(jàt.ta).ri.(àn.sa) (ré.pe).(ä`.mä) (pé.ri).jä (pú.he).li.(mèl.la).ni (pú.he).li.(mìs.ta).ni (mä’.ki) (má.te).ma.(tìik.ka) (mér.ko).(nò.min) (kái.nos).(tè.li).jat (ká.las).te.(lèm.me) (ká.las).te.(lè.mi).nen (ká.las).(tè.let) (kú.nin).gas (jä’r.jes).tel.(mä`l.li).syy.(dèl.lä).ni (jä’r.jes).(tèl.mät).tö.(my`y.des).(tä`n.sä) (jä’r.jes).(tèl.mäl).(lìs.tä).mä.(tö`n.tä)

45 Two Errors (ká.las).te.(lè.mi).nen (jä´r.jes).tel.(mä`l.li).syy.(dèl.lä).ni The interaction of Lapse and StressToWeight does not produce the desired result in these cases.

46 What is wrong? define Debug(Input) [ DebugStressToWeight( NonFinal( Lapse( FootBin( MainStress( AlignLeft( Clash( Input.o. Gen))))))) ]; regex Debug({kalasteleminen}); (ká*.las).te.(lè*.mi).nen <-- actual winner (ká*.las).(tè*.le).(mì*.nen) <-- desired output (jä´r.jes).tel.(mä`l.li).syy.(dèl.lä).ni (jä’r.jes).(tèl.mäl).li.(sy`y.del).(lä`*.ni) The StressToWeight constraint eliminates some of the desired winning candidates.

47 Nine Elenbaas A unified account of binary and ternary stress. Ph.D. dissertation. University of Utrecht. 1999. Elenbaas has a special constraint *(L’ H) or AntiLStressH( in place of Kiparsky’s more general StressToWeight constraint. define FinnishProsody(Input) [ AllFeetFirst( Parse( AntiLStressH( NonFinal( Lapse( AlignLeft( FootBin( MainStress( Clash( Input.o. Gen))))))))) ]; define AntiLStressH(X) Eval(X, SS & Light, "(", "." Heavy);

48 Result (ér.go).(nò.mi).a (íl.moit).(tàu.tu).mi.(sès.ta) (íl.moit).(tàu.tu).(mì.nen) (ón.nit).(tè.le).(mà.ni).kin (ó.pis).(kè.li).ja (ó.pet).ta.(màs.sa) (vói.mis).te.(lùt.te).le.(màs.ta) (strúk.tu).ra.(lìs.mi) (rá.vin).(tò.lat) (rá.kas).ta.(jàt.ta).ri.(àn.sa) (ré.pe).(ä`.mä) (pé.ri).jä (pú.he).li.(mèl.la).ni (pú.he).li.(mìs.ta).ni (mä’.ki) (má.te).ma.(tìik.ka) (mér.ko).(nò.min) (kái.nos).(tè.li).jat (ká.las).te.(lèm.me) (ká.las).te.(lè.mi).nen (ká.las).(tè.let) (kú.nin).gas (jä’r.jes).(tèl.mäl).li.(syy’.del).(lä’.ni) (jä’r.jes).(tèl.mät).tö.(my`y.des).(tä`n.sä) (jä’r.jes).(tèl.mäl).(lìs.tä).mä.(tö`n.tä)

49 Did She Know? Six syllables(Appendix of Elenbaas thesis) X X L L L L áterìanàni áteriànani 'meal (Ess 1SG)' érgonòmiàna 'ergonomics (Ess)' káinostèlijàna 'shy person (Ess)' káinostèlijàni 'shy person (Nom 1SG)' kúnnallìsenàni 'council (Ess 1SG)' kúnnallìsiàni ’councils (Part 1SG)' kúnnallìsinàni 'councils (Ess 1SG)' mérkonòmiàni 'degree in economics (Part 1SG)' mérkonòminàni 'degree in economics (Ess 1SG)' ó piskèlijàni'student (Nom 1SG)' púhelìmenàni'telephone (Ess 1SG)' púhelìmiàni’telephone (Part 1SG)’ Missing pattern: X X L L L H

50 Conclusion Can we get Ternary feet in Finnish “for free”, from the interaction of independently motivated principles? We don’t know. Optimality Prosody is computationally very difficult. The number of initial candidates is huge: kalasteleminen 70,653 järjestelmällisyydelläni21,767,579 Simple tableau methods do not work. Finite-state implementation guards against errors made by a human GEN and EVAL. But even when an error can be pinpointed, the fix is not obvious. Debugging OT constraints is as hard as debugging two-level rules, in practice more difficult than rewrite systems.

51 Final Thoughts Morphology is a regular relation. The composition of words (morphosyntax), morphological alternations, and prosody can be described in finite-state terms. A complex relation can be decomposed in different ways. There are many flavors of finite-state morphology: Item-and- Arrangement, Rewrite rules, Two-level rules, Realizational Morphology, Classical optimality constraints. Computing with finite-state tools is fun and easy. We have sophisticated formalism for describing regular relations, efficient compilers and runtime software. ‘Pen-and-pencil’ morphology badly needs computational support. It is difficult to get globally correct results relying on a handful of interesting words, rules, and constraints. Even the best experts make errors.

52 References Kenneth R. Beesley & Lauri Karttunen, Finite State Morphology, CSLI Publications. March 2003. (Software included). http://www.fsmbook.com/ Lauri Karttunen, "Computing with Realizational Morphology" in CICLing-2003, A. Gelbukh (ed.), Lecture Notes in Computer Science 2588, pages 205-216. Springer Verlag. 2003.


Download ppt "Fi(n)nish OT Prosody Lauri Karttunen FSMNLP-2005 September 1, 2005."

Similar presentations


Ads by Google