Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Joint Model of Orthography and Morphological Segmentation

Similar presentations


Presentation on theme: "A Joint Model of Orthography and Morphological Segmentation"— Presentation transcript:

1 A Joint Model of Orthography and Morphological Segmentation
Ryan Cotterell, Tim Vieira, Hinrich Schütze

2 Morphology matters! To Work on Morphology!
Morphologically rich is not the exception, it’s the rule! Facts from WALS: 85% of all languages make use of affixation 80% mark verb tense through morphology 65% mark grammatical case through morphology NAACL 2016 Proceedings: ≈ 3% of papers involve morphology I'm going to start this talk with a bit of a by making a small point about computational morphology. Personally, I think computational morphology is an exciting area of NLP, and I've always been a bit disappointed more people don't work on it. Morphology is the rule -- not the exception! What I mean by this is that if we want NLP systems that work for the majority of the world's languages, this is a problem we need to solve. To show this, I extracted some simple statistics from the WALS database of typological properties. - 85% of the world's languages make use of affixation in some form- 80% mark verb tense through morphology- 2/3 mark grammatical caseThat's a lot of languages.By way of comparison, I compiled my own statistics from the NAACL proceedings: 3%(6.0 / 181 of the papers (as far I could tell) dealt withmorphology.

3 Surface Morphological Segmentation
We are going to give examples in English, but other languages are far more complex!

4 unachievability un achiev abil ity Segment PREFIX STEM SUFFIX SUFFIX
One common way of processing morphology is what we are going to call *surface* morphological segmentation. The goal, roughly speaking, is to separate a surface form of a word into its sequence of morphemes. Perhaps with a labeling. This task has attracted a lot of attention over the years with a number supervise and unsupervised methods being proposed. PREFIX STEM SUFFIX SUFFIX

5 Canonical Morphological Segmentation

6 unachievability unachieveableity un achieve able ity Restore Segment
DEFINE UNDERLYING FORM This work focuses on a different formulation of the task: canonical segmentation. The goal here is to map the surface form into an underlying form and *then* segment it. To point out the differences, compared to the last slide, we have added an "e" to "achieve" and mapped "abil" to "able". un achieve able ity PREFIX STEM SUFFIX SUFFIX

7 Why is this useful? Here's why you should care about this problem. Segmenting words alone is not enough. We eventually need to reason about the relationships between words. When we perform canonical segmentation, it becomes immediately clear, which words share morphemes.

8 unachievability achievement underachiever achieves
Segmentation does not happen in isolation. Ideally, we would like to analyze all the word's in a language's lexicon achieves

9 un achiev abil ity achieve ment under achiev er achieve s

10 Are they the same morpheme???
un achiev abil ity achieve ment under achiev er achieve s

11 unachievability achievement underachiever achieves
Segmentation does not happen in isolation. Ideally, we would like to analyze all the word's in a language's lexicon achieves

12 unachieveableity achievement underachieveer achieves

13 un achieve able ity achieve ment under achieve er achieve s

14 Canonical segmentations are standardized across words
un achieve able ity achieve ment under achieve er Better preprocessing, e.g., more meaningful reduction in sparsity and reasoning about compositionality achieve s

15 unachievability thinkable accessible untouchable
Segmentation does not happen in isolation. Ideally, we would like to analyze all the word's in a language's lexicon untouchable

16 unachieveableity thinkable accessable untouchable
Segmentation does not happen in isolation. Ideally, we would like to analyze all the word's in a language's lexicon untouchable

17 un achieve able ity think able access able un touch able

18 un achieve able ity think able access able un touch able

19 A Joint Model To the best of our knowledge, the fully supervised version of this task has never been considered before in the literature so introduce a novel joint probability model.

20 unachievability unachieveableity un achieve able ity
We model the probability of a canonical segmentation – CLICK and an underlying form – CLICK given the surface form of a word – CLICK CLICKThe first factor scores a canonical segmentation underlying form pair. Basically, it asks how good is this pair? For example, un - achieve - able -ity and achieavility. This a structured factor and can be seem as the score of a semi-Markov model.CLICKThe second factror scores a surface segmentation, underlying form pair. Basically, it asks how good is this pair? Now, this notation belies a bit of the complexity. This factor is, again, structured. In fact, in general we have to encoder all possible alignmenet between the two strings. Luckily, we can encode this as a weighted finite-state machine. The paper explains this in detail.CLICKWe put them all together and we get our model. The remaining details such as the feature templates can be found in the paper.PAUSECLICK Canonical Segmentation Underlying Form Word (Surface Form)

21 (s=un achieve able ity, u=unachieveableity)
How good is the segmentation- underlying form pair? (s=un achieve able ity, u=unachieveableity) How good is the underlying form-word pair? We define this model as being proportional the exponential of a linear model. We can see this as being composed of two difference factors. (u=unachieveableity, w=unachievability)

22 Inference and Learning
Inference is intractable! Approximate inference with importance sampling Decoding also with importance sampling Learning AdaGrad (Duchi et al. 2011) Unfortunately, marginal inference in our model is intractable! We explain why in the paper. As the model is globally normalized, even computing a gradient requires inference. To solve this, we rely on an approximation known as importance sampling. At a high-level, importance sampling takes smaples from an easy-distribution and lets the model rescore them. Decoding a.k.a. MAP infernece also intractable, but, again, we can approximately solve this with importance sampling.Once we get our approximate gradient, using importance sampling, we train the model with AdaGrad.CLICK

23 Baselines Pipeline semi-Markov CRF Weighted Finite-state Machine
Same model, but trained sequentially Does joint modeling help? semi-Markov CRF Segmentation model without orthography How different are canonical segmentations from surface segmentations (Cotterell et al. 2015)? Weighted Finite-state Machine Do segment level features matter?

24 Results (Error Rate) Languages: English German Indonesian
Say that it’s 1-best error rate Say WFST = Weighted Finite-State Transducer Remove the languages after mentioning them Mention future work Baselines: Pipeline SemiCRF WFST

25 Results (Edit Distance)
Say that it’s 1-best error rate Say WFST = Weighted Finite-State Transducer Remove the languages after mentioning them Mention future work

26 Results (Morpheme F1) Say that it’s 1-best error rate
Say WFST = Weighted Finite-State Transducer Remove the languages after mentioning them Mention future work

27 Fin. Thank You!

28 unachieveableity unachievability un achieve able ity
We model the probability of a canonical segmentation – CLICK and an underlying form – CLICK given the surface form of a word – CLICK CLICKThe first factor scores a canonical segmentation underlying form pair. Basically, it asks how good is this pair? For example, un - achieve - able -ity and achieavility. This a structured factor and can be seem as the score of a semi-Markov model.CLICKThe second factror scores a surface segmentation, underlying form pair. Basically, it asks how good is this pair? Now, this notation belies a bit of the complexity. This factor is, again, structured. In fact, in general we have to encoder all possible alignmenet between the two strings. Luckily, we can encode this as a weighted finite-state machine. The paper explains this in detail.CLICKWe put them all together and we get our model. The remaining details such as the feature templates can be found in the paper.PAUSECLICK


Download ppt "A Joint Model of Orthography and Morphological Segmentation"

Similar presentations


Ads by Google