A Joint Model of Orthography and Morphological Segmentation

Slides:

Advertisements

Similar presentations

CSC321: Introduction to Neural Networks and Machine Learning Lecture 24: Non-linear Support Vector Machines Geoffrey Hinton.

Advertisements

Structured SVM Chen-Tse Tsai and Siddharth Gupta.

Machine learning continued Image source:

Introduction to Conditional Random Fields John Osborne Sept 4, 2009.

The University of Wisconsin-Madison Universal Morphological Analysis using Structured Nearest Neighbor Prediction Young-Bum Kim, João V. Graça, and Benjamin.

1 Gholamreza Haffari Anoop Sarkar Presenter: Milan Tofiloski Natural Language Lab Simon Fraser university Homotopy-based Semi- Supervised Hidden Markov.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Fundamental limits in Information Theory Chapter 10 :

Computational language: week 9 Finish finite state machines FSA’s for modelling word structure Declarative language models knowledge representation and.

Conditional Random Fields

“Applying Morphology Generation Models to Machine Translation” By Kristina Toutanova, Hisami Suzuki, Achim Ruopp (Microsoft Research). UW Machine Translation.

Time Series Data Analysis - II

STRUCTURED PERCEPTRON Alice Lai and Shi Zhi. Presentation Outline Introduction to Structured Perceptron ILP-CRF Model Averaged Perceptron Latent Variable.

SI485i : NLP Set 3 Language Models Fall 2012 : Chambers.

Assessing Reading: Meeting Year 3 Expectations

Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture4 1 August 2007.

Introduction Morphology is the study of the way words are built from smaller units: morphemes un-believe-able-ly Two broad classes of morphemes: stems.

Adaptor Grammars Ehsan Khoddammohammadi Recent Advances in Parsing Technology WS 2012/13 Saarland University 1.

Morpho Challenge competition Evaluations and results Authors Mikko Kurimo Sami Virpioja Ville Turunen Krista Lagus.

Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.

BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.

Prototype-Driven Learning for Sequence Models Aria Haghighi and Dan Klein University of California Berkeley Slides prepared by Andrew Carlson for the Semi-

Graphical Models over Multiple Strings Markus Dreyer and Jason Eisner Dept. of Computer Science, Johns Hopkins University EMNLP 2009 Presented by Ji Zongcheng.

Natural Language Processing Chapter 2 : Morphology.

III. MORPHOLOGY. III. Morphology 1. Morphology The study of the internal structure of words and the rules by which words are formed. 1.1 Open classes.

[TACL] Modeling Word Forms Using Latent Underlying Morphs and Phonology Ryan Cotterell and Nanyun Peng and Jason Eisner 1.

The structure and Function of Phrases and Sentences

A Simple English-to-Punjabi Translation System By : Shailendra Singh.

Tasneem Ghnaimat. Language Model An abstract representation of a (natural) language. An approximation to real language Assume we have a set of sentences,

1. the study of morphemes and their different forms (allomorphs), and the way they combine in WORD FORMATION, e.g unfriendly is formed from friend, the.

英语写作，水平 1 基础写作 Fundamental Writing Unit 6 How to Review a Paper? 激扬文字.

Ryan Cotterell and Hinrich Schütze

Morphological Smoothing and Extrapolation of Word Embeddings

Lecture 7: Constrained Conditional Models

Chapter 8: Estimating with Confidence

Chapter 8: Estimating with Confidence

Morphology Morphology Morphology Dr. Amal AlSaikhan Morphology.

Lecture -3 Week 3 Introduction to Linguistics – Level-5 MORPHOLOGY

Introduction to Linguistics

عمادة التعلم الإلكتروني والتعليم عن بعد

Done Done Course Overview What is AI? What are the Major Challenges?

Chapter 3 Morphology Without grammar, little can be conveyed. Without vocabulary, nothing can be conveyed. (David Wilkins ,1972) Morphology refers to.

کاربرد نگاشت با حفظ تنکی در شناسایی چهره

English Proficiency Workshop

Chapter 6 Morphology.

Grammar Workshop Thursday 9th June.

Prototype-Driven Learning for Sequence Models

Morphological Segmentation Inside-Out

CSCI 5832 Natural Language Processing

CSCE 411 Design and Analysis of Algorithms

Sampling Distribution of the Sample Mean

Describe two features of…

What Are They? Who Needs ‘em? An Example: Scoring in Tennis

Língua Inglesa - Aspectos Morfossintáticos

Chapter 8: Estimating with Confidence

Chapter 8: Estimating with Confidence

Chapter 8: Estimating with Confidence

Machine learning overview

Chapter 8: Estimating with Confidence

Statistical NLP Spring 2011

Chapter 8: Estimating with Confidence

Chapter 8: Estimating with Confidence

Johns Hopkins 2003 Summer Workshop on Syntax and Statistical Machine Translation Chapters 5-8 Ethan Phelps-Goodman.

Chapter 8: Estimating with Confidence

Dan Roth Department of Computer Science

SVMs for Document Ranking

Introduction to English morphology

Standard Normal Table Area Under the Curve

CS249: Neural Language Model

Presentation transcript:

A Joint Model of Orthography and Morphological Segmentation Ryan Cotterell, Tim Vieira, Hinrich Schütze

Morphology matters! To Work on Morphology! Morphologically rich is not the exception, it’s the rule! Facts from WALS: 85% of all languages make use of affixation 80% mark verb tense through morphology 65% mark grammatical case through morphology NAACL 2016 Proceedings: ≈ 3% of papers involve morphology I'm going to start this talk with a bit of a by making a small point about computational morphology. Personally, I think computational morphology is an exciting area of NLP, and I've always been a bit disappointed more people don't work on it. Morphology is the rule -- not the exception! What I mean by this is that if we want NLP systems that work for the majority of the world's languages, this is a problem we need to solve. To show this, I extracted some simple statistics from the WALS database of typological properties. - 85% of the world's languages make use of affixation in some form- 80% mark verb tense through morphology- 2/3 mark grammatical caseThat's a lot of languages.By way of comparison, I compiled my own statistics from the NAACL proceedings: 3%(6.0 / 181 of the papers (as far I could tell) dealt withmorphology.

Surface Morphological Segmentation We are going to give examples in English, but other languages are far more complex!

unachievability un achiev abil ity Segment PREFIX STEM SUFFIX SUFFIX One common way of processing morphology is what we are going to call *surface* morphological segmentation. The goal, roughly speaking, is to separate a surface form of a word into its sequence of morphemes. Perhaps with a labeling. This task has attracted a lot of attention over the years with a number supervise and unsupervised methods being proposed. PREFIX STEM SUFFIX SUFFIX

Canonical Morphological Segmentation

unachievability unachieveableity un achieve able ity Restore Segment DEFINE UNDERLYING FORM This work focuses on a different formulation of the task: canonical segmentation. The goal here is to map the surface form into an underlying form and *then* segment it. To point out the differences, compared to the last slide, we have added an "e" to "achieve" and mapped "abil" to "able". un achieve able ity PREFIX STEM SUFFIX SUFFIX

Why is this useful? Here's why you should care about this problem. Segmenting words alone is not enough. We eventually need to reason about the relationships between words. When we perform canonical segmentation, it becomes immediately clear, which words share morphemes.

unachievability achievement underachiever achieves Segmentation does not happen in isolation. Ideally, we would like to analyze all the word's in a language's lexicon achieves

un achiev abil ity achieve ment under achiev er achieve s

Are they the same morpheme??? un achiev abil ity achieve ment under achiev er achieve s

unachievability achievement underachiever achieves Segmentation does not happen in isolation. Ideally, we would like to analyze all the word's in a language's lexicon achieves

unachieveableity achievement underachieveer achieves

un achieve able ity achieve ment under achieve er achieve s

Canonical segmentations are standardized across words un achieve able ity achieve ment under achieve er Better preprocessing, e.g., more meaningful reduction in sparsity and reasoning about compositionality achieve s

unachievability thinkable accessible untouchable Segmentation does not happen in isolation. Ideally, we would like to analyze all the word's in a language's lexicon untouchable

unachieveableity thinkable accessable untouchable Segmentation does not happen in isolation. Ideally, we would like to analyze all the word's in a language's lexicon untouchable

un achieve able ity think able access able un touch able

un achieve able ity think able access able un touch able

A Joint Model To the best of our knowledge, the fully supervised version of this task has never been considered before in the literature so introduce a novel joint probability model.

unachievability unachieveableity un achieve able ity We model the probability of a canonical segmentation – CLICK and an underlying form – CLICK given the surface form of a word – CLICK CLICKThe first factor scores a canonical segmentation underlying form pair. Basically, it asks how good is this pair? For example, un - achieve - able -ity and achieavility. This a structured factor and can be seem as the score of a semi-Markov model.CLICKThe second factror scores a surface segmentation, underlying form pair. Basically, it asks how good is this pair? Now, this notation belies a bit of the complexity. This factor is, again, structured. In fact, in general we have to encoder all possible alignmenet between the two strings. Luckily, we can encode this as a weighted finite-state machine. The paper explains this in detail.CLICKWe put them all together and we get our model. The remaining details such as the feature templates can be found in the paper.PAUSECLICK Canonical Segmentation Underlying Form Word (Surface Form)

(s=un achieve able ity, u=unachieveableity) How good is the segmentation- underlying form pair? (s=un achieve able ity, u=unachieveableity) How good is the underlying form-word pair? We define this model as being proportional the exponential of a linear model. We can see this as being composed of two difference factors. (u=unachieveableity, w=unachievability)

Inference and Learning Inference is intractable! Approximate inference with importance sampling Decoding also with importance sampling Learning AdaGrad (Duchi et al. 2011) Unfortunately, marginal inference in our model is intractable! We explain why in the paper. As the model is globally normalized, even computing a gradient requires inference. To solve this, we rely on an approximation known as importance sampling. At a high-level, importance sampling takes smaples from an easy-distribution and lets the model rescore them. Decoding a.k.a. MAP infernece also intractable, but, again, we can approximately solve this with importance sampling.Once we get our approximate gradient, using importance sampling, we train the model with AdaGrad.CLICK

Baselines Pipeline semi-Markov CRF Weighted Finite-state Machine Same model, but trained sequentially Does joint modeling help? semi-Markov CRF Segmentation model without orthography How different are canonical segmentations from surface segmentations (Cotterell et al. 2015)? Weighted Finite-state Machine Do segment level features matter?

Results (Error Rate) Languages: English German Indonesian Say that it’s 1-best error rate Say WFST = Weighted Finite-State Transducer Remove the languages after mentioning them Mention future work Baselines: Pipeline SemiCRF WFST

Results (Edit Distance) Say that it’s 1-best error rate Say WFST = Weighted Finite-State Transducer Remove the languages after mentioning them Mention future work

Results (Morpheme F1) Say that it’s 1-best error rate Say WFST = Weighted Finite-State Transducer Remove the languages after mentioning them Mention future work

Fin. Thank You!

unachieveableity unachievability un achieve able ity We model the probability of a canonical segmentation – CLICK and an underlying form – CLICK given the surface form of a word – CLICK CLICKThe first factor scores a canonical segmentation underlying form pair. Basically, it asks how good is this pair? For example, un - achieve - able -ity and achieavility. This a structured factor and can be seem as the score of a semi-Markov model.CLICKThe second factror scores a surface segmentation, underlying form pair. Basically, it asks how good is this pair? Now, this notation belies a bit of the complexity. This factor is, again, structured. In fact, in general we have to encoder all possible alignmenet between the two strings. Luckily, we can encode this as a weighted finite-state machine. The paper explains this in detail.CLICKWe put them all together and we get our model. The remaining details such as the feature templates can be found in the paper.PAUSECLICK