Presentation is loading. Please wait.

Presentation is loading. Please wait.

Morphological Segmentation Inside-Out

Similar presentations


Presentation on theme: "Morphological Segmentation Inside-Out"— Presentation transcript:

1 Morphological Segmentation Inside-Out
Ryan Cotterell, Arun Kumar, Hinrich Schütze

2 Old Idea: Surface Morphological Segmentation
We are going to give examples in English, but other languages are far more complex!

3 unachievability un achiev abil ity Segment PREFIX STEM SUFFIX SUFFIX
One common way of processing morphology is what we are going to call *surface* morphological segmentation. The goal, roughly speaking, is to separate a surface form of a word into its sequence of morphemes. Perhaps with a labeling. This task has attracted a lot of attention over the years with a number supervise and unsupervised methods being proposed. PREFIX STEM SUFFIX SUFFIX

4 Semi-New Idea: Canonical Morphological Segmentation

5 unachievability unachieveableity un achieve able ity Restore Segment
DEFINE UNDERLYING FORM This work focuses on a different formulation of the task: canonical segmentation. The goal here is to map the surface form into an underlying form and *then* segment it. To point out the differences, compared to the last slide, we have added an "e" to "achieve" and mapped "abil" to "able". un achieve able ity PREFIX STEM SUFFIX SUFFIX

6 Why is canonicalization useful?
Here's why you should care about this problem. Segmenting words alone is not enough. We eventually need to reason about the relationships between words. When we perform canonical segmentation, it becomes immediately clear, which words share morphemes.

7 unachievability achievement underachiever achieves
Segmentation does not happen in isolation. Ideally, we would like to analyze all the word's in a language's lexicon achieves

8 un achiev abil ity achieve ment under achiev er achieve s

9 Are they the same morpheme???
un achiev abil ity achieve ment under achiev er achieve s

10 unachievability achievement underachiever achieves
Segmentation does not happen in isolation. Ideally, we would like to analyze all the word's in a language's lexicon achieves

11 unachieveableity achievement underachieveer achieves

12 un achieve able ity achieve ment under achieve er achieve s

13 Canonical segmentations are standardized across words
un achieve able ity achieve ment under achieve er Better preprocessing, e.g., more meaningful reduction in sparsity and reasoning about compositionality achieve s

14 unachievability thinkable accessible untouchable
Segmentation does not happen in isolation. Ideally, we would like to analyze all the word's in a language's lexicon untouchable

15 unachieveableity thinkable accessable untouchable
Segmentation does not happen in isolation. Ideally, we would like to analyze all the word's in a language's lexicon untouchable

16 un achieve able ity think able access able un touch able

17 un achieve able ity think able access able un touch able

18 New Idea: Morphology as Parsing

19 unachievability achievement underachiever achieves
Segmentation does not happen in isolation. Ideally, we would like to analyze all the word's in a language's lexicon achieves

20 unachieveableity achievement underachieveer achieves

21 un achieve able ity achieve ment under achieve er achieve s

22 un achieve able ity achieve ment under achieve er achieve s

23 under achieve er

24 under achieve er

25 under achieve er

26 under achieve er

27 PREFIX STEM SUFFIX under achieve er

28 Why are trees useful? Here's why you should care about this problem. Segmenting words alone is not enough. We eventually need to reason about the relationships between words. When we perform canonical segmentation, it becomes immediately clear, which words share morphemes.

29 Reason 1: Words are ambiguous!
Tree Captures Ambiguity! SUFFIX PREFIX STEM SUFFIX PREFIX STEM un lock able un lock able “capable of being unlocked” “incapable of being locked” PREFIX STEM SUFFIX un lock able “???” Flat Segmentation Doesn’t!

30 Reason 2: Model Order of Affixation
Path of Derivation achieve underachieve underachiever Encoded As Tree More Features PREFIX STEM SUFFIX under achieve er

31 New Resource To the best of our knowledge, the fully supervised version of this task has never been considered before in the literature so introduce a novel joint probability model.

32 Morphological Tree Bank
English Size

33 A Joint Model To the best of our knowledge, the fully supervised version of this task has never been considered before in the literature so introduce a novel joint probability model.

34 Canonical Segmentation Parse Tree
unachieveableity Underlying Form unachievability un achieve able ity Canonical Segmentation Parse Tree unachieveableity unachievability un achieve able ity We model the probability of a canonical segmentation – CLICK and an underlying form – CLICK given the surface form of a word – CLICK CLICKThe first factor scores a canonical segmentation underlying form pair. Basically, it asks how good is this pair? For example, un - achieve - able -ity and achieavility. This a structured factor and can be seem as the score of a semi-Markov model.CLICKThe second factror scores a surface segmentation, underlying form pair. Basically, it asks how good is this pair? Now, this notation belies a bit of the complexity. This factor is, again, structured. In fact, in general we have to encoder all possible alignmenet between the two strings. Luckily, we can encode this as a weighted finite-state machine. The paper explains this in detail.CLICKWe put them all together and we get our model. The remaining details such as the feature templates can be found in the paper.PAUSECLICK Word (Surface Form)

35 (s=un achieve able ity, u=unachieveableity)
How good is the tree- underlying form pair? (s=un achieve able ity, u=unachieveableity) How good is the underlying form-word pair? We define this model as being proportional the exponential of a linear model. We can see this as being composed of two difference factors. (u=unachieveableity, w=unachievability)

36 Inference and Learning
Inference is intractable! Approximate inference with importance sampling Decoding also with importance sampling Learning AdaGrad (Duchi et al. 2011) Unfortunately, marginal inference in our model is intractable! We explain why in the paper. As the model is globally normalized, even computing a gradient requires inference. To solve this, we rely on an approximation known as importance sampling. At a high-level, importance sampling takes smaples from an easy-distribution and lets the model rescore them. Decoding a.k.a. MAP infernece also intractable, but, again, we can approximately solve this with importance sampling.Once we get our approximate gradient, using importance sampling, we train the model with AdaGrad.CLICK

37 Experimental Results Key Point: Do trees help segmentation accuracy?
Baseline: flat segmentation model New Task:

38 Results

39 Fin. Thank You!


Download ppt "Morphological Segmentation Inside-Out"

Similar presentations


Ads by Google