Presentation is loading. Please wait.

Presentation is loading. Please wait.

ParaMor Minimally Supervised Induction of Paradigm Structure and Morphological Analysis Christian Monson, Jaime Carbonell, Alon Lavie, Lori Levin Monolingual.

Similar presentations


Presentation on theme: "ParaMor Minimally Supervised Induction of Paradigm Structure and Morphological Analysis Christian Monson, Jaime Carbonell, Alon Lavie, Lori Levin Monolingual."— Presentation transcript:

1 ParaMor Minimally Supervised Induction of Paradigm Structure and Morphological Analysis Christian Monson, Jaime Carbonell, Alon Lavie, Lori Levin Monolingual Text Morphologically Analyzed Text Unsupervised Morphology Induction Paradigms Organize Inflectional Morphology English Paradigm Cells Inflection Class ‘eat’‘silent-e’ Unmarkedeatdance, erase, … Present, 3 rd eatsdances, erases, … Past Tenseatedanced, erased, … Progressiveeatingdancing, erasing, … Passiveeatendanced, erased, … Paradigm Discovery in 3 Steps 2. Cluster Candidate Paradigms 3. Filter Unlikely Candidates Spanish Paradigm Cells Inflection Class arerir 1 st, Sg, Presentooo 2 nd, Sg, Presentases 3 rd, Sg, Presentaee 1 st, Pl, Presentamosemosimos...……… HabModeReport Pol / Mood Tense Obj Agr kepe(ü)rke laa fi kifu ØØØ nuafu Ø ØØ Mapudungun (Non-Indoeuropean, Central Chile) Subj Agr / Mood (ü)n li chi yu … LocAsp patu puka ØØ Results Inflectional & DerivationalInflectional Only EnglishGermanEnglishGerman PRF1F1 σ (F 1 ) PRF1F1 PRF1F1 PRF1F1 ParaMor48.953.651.10.860.033.543.00.733.081.447.00.942.868.652.70.8 Morfessor73.634.046.51.166.937.147.70.753.347.049.91.338.744.241.20.8 SegmentationEvaluation Methodology Morpho Challenge 2007 Competition for unsupervised morphology induction algorithms English 3 rd Place Overall Bested Morfessor (Creutz, 2006) a state-of-the-art unsupervised morphology induction algorithm German 1 st Place with Combined ParaMor- Morfessor System Small Candidates contain few affixes and cover few types Incorrect Morpheme Boundary Candi- dates segment too far to the left. Ø.ipocovers 8 words Ø.e.iucovers 12 words iza.izado.izan.izar.izaron.izarán.izó der.derá.dido.diendo.dieron.dió.día 16: a.aba.ada.adas.ado.ados.an.ando.ar.ara.aron.arse.ará.arán.aría.ó 15: a.aba.ada.adas.ado.ados.an.ando.ar.ara.aron.arse.ará.arán.ó 15: a.aba.ada.adas.ado.ados.an.ando.ar.aron.arse.ará.arán.aría.ó 15: a.aba.aban.ada.adas.ado.ados.an.ando.ar.aron.arse.ará.arán.ó 17: a.aba.aban.ada.adas.ado.ados.an.ando.ar.ara.aron.arse.ará.arán.aría.ó llega Error analysis identified 2 major categories of incorrect candidates 1.Match word to segment against clustered affixes 2.Replace any matched affix with new affix from cluster 3.Segment the original word, if the corpus contains the hypothesized word form llegaba llegaban llegada … lleg +a 1.Sample pairs of words that share morphemes. Precision:Sample pairs sharing a morpheme in the automatic analyses Recall: Sample pairs from an answer key of morphologically analyzed words 2.Examine corresponding analyses Precsion:Count sampled pairs that share a morpheme in the answer key Recall:Count sampled pairs that share a morpheme in the automatic analyses C ross-linguistically, languages inflect using paradigms—sets of mutually exclusive cells. Exactly one cell from each paradigm can be filled (by an affix) in a surface word form. 1.Search – Greedy bottom-up search through an empirical network of candidate partial paradigms. Here, red candidate paradigms are active in search 2.Cluster – Hierarchical agglomerative clustering adapted to the peculiarities of partial paradigms 3.Filter – Improve precision by removing unclustered and unlikely candidates Spanish data guided algorithm development and parameter adjustment 1. Recall Centric Search e.er.erá.ido.ieron.ió 28: deb, escog, ofrec, roconoc, vend,... e.ido.ieron.ir.irá.ió 28: asist, dirig, exig, ocurr, sufr,... e.erá.ido.ieron.ió 28: deb, escog,... e.er.ido.ieron.ió 46: deb, parec, recog... e.ido.ieron.irá.ió 28: asist, dirig,... e.ido.ieron.ir.ió 39: asist, bat, sal,... e.er.erá.ieron.ió 32: deb, padec, romp,... e.ido.ieron.ió 86: asist, deb, hund,... e.erá.ieron.ió 32: deb, padec,... er.ido.ieron.ió 58: ascend, ejerc, recog,... ido.ieron.ir.ió 44: interrump, sal,... azar.e.ido.ieron.ir.ió 1: sal The Next Steps Extend ParaMor to hypothesize more than one morpheme boundary per analysis Expand beyond suffixation to other morphological phenomena, prefixes, etc. Merge inflection classes of the same paradigm Identify morphophonemic changes A Closer Look at ParaMor vs. Morfessor


Download ppt "ParaMor Minimally Supervised Induction of Paradigm Structure and Morphological Analysis Christian Monson, Jaime Carbonell, Alon Lavie, Lori Levin Monolingual."

Similar presentations


Ads by Google