The University of Wisconsin-Madison Universal Morphological Analysis using Structured Nearest Neighbor Prediction Young-Bum Kim, João V. Graça, and Benjamin.

Slides:

Advertisements

Similar presentations

Three Basic Problems Compute the probability of a text: P m (W 1,N ) Compute maximum probability tag sequence: arg max T 1,N P m (T 1,N | W 1,N ) Compute.

Advertisements

Location Recognition Given: A query image A database of images with known locations Two types of approaches: Direct matching: directly match image features.

MULTEXT-East Version 4: Multilingual Morphosyntactic Specifications, Lexicons and Corpora Tomaž Erjavec Department of Knowledge Technologies Jožef.

On-line learning and Boosting

Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser Institute for Natural Language Processing University of Stuttgart

Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.

Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart

FilterBoost: Regression and Classification on Large Datasets Joseph K. Bradley 1 and Robert E. Schapire 2 1 Carnegie Mellon University 2 Princeton University.

Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance July 27 EMNLP 2011 Shay B. Cohen Dipanjan Das Noah A. Smith Carnegie Mellon University.

Exploiting Dictionaries in Named Entity Extraction: Combining Semi-Markov Extraction Processes and Data Integration Methods William W. Cohen, Sunita Sarawagi.

Re-ranking for NP-Chunking: Maximum-Entropy Framework By: Mona Vajihollahi.

Confidence Estimation for Machine Translation J. Blatz et.al, Coling 04 SSLI MTRG 11/17/2004 Takahiro Shinozaki.

Mutual Information Mathematical Biology Seminar

Hindi POS tagging and chunking : An MEMM approach Aniket Dalal Kumar Nagaraj Uma Sawant Sandeep Shelke Under the guidance of Prof. P. Bhattacharyya.

1 Jun Wang, 2 Sanjiv Kumar, and 1 Shih-Fu Chang 1 Columbia University, New York, USA 2 Google Research, New York, USA Sequential Projection Learning for.

Multiple-Instance Learning Paper 1: A Framework for Multiple-Instance Learning [Maron and Lozano-Perez, 1998] Paper 2: EM-DD: An Improved Multiple-Instance.

Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.

Ensemble Learning: An Introduction

“Applying Morphology Generation Models to Machine Translation” By Kristina Toutanova, Hisami Suzuki, Achim Ruopp (Microsoft Research). UW Machine Translation.

Boosting Applied to Tagging and PP Attachment By Aviad Barzilai.

Towards unsupervised induction of morphophonological rules Erwin Chan University of Pennsylvania Morphochallenge workshop 19 Sept 2007.

Maximum Entropy Model LING 572 Fei Xia 02/07-02/09/06.

Distributed Representations of Sentences and Documents

1 A Rank-by-Feature Framework for Interactive Exploration of Multidimensional Data Jinwook Seo, Ben Shneiderman University of Maryland Hyun Young Song.

Maximum Entropy Model LING 572 Fei Xia 02/08/07. Topics in LING 572 Easy: –kNN, Rocchio, DT, DL –Feature selection, binarization, system combination –Bagging.

Handwritten Character Recognition using Hidden Markov Models Quantifying the marginal benefit of exploiting correlations between adjacent characters and.

Keyphrase Extraction in Scientific Documents Thuy Dung Nguyen and Min-Yen Kan School of Computing National University of Singapore Slides available at.

Computational Methods to Vocalize Arabic Texts H. Safadi*, O. Al Dakkak** & N. Ghneim**

Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning

Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.

Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.

Comparative study of various Machine Learning methods For Telugu Part of Speech tagging -By Avinesh.PVS, Sudheer, Karthik IIIT - Hyderabad.

A search-based Chinese Word Segmentation Method ——WWW 2007 Xin-Jing Wang: IBM China Wen Liu: Huazhong Univ. China Yong Qin: IBM China.

LOGO Ensemble Learning Lecturer: Dr. Bo Yuan

CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)

Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者：郝柏翰 2013/01/28.

Training dependency parsers by jointly optimizing multiple objectives Keith HallRyan McDonaldJason Katz- BrownMichael Ringgaard.

Modeling Speech using POMDPs In this work we apply a new model, POMPD, in place of the traditional HMM to acoustically model the speech signal. We use.

Recognizing Names in Biomedical Texts: a Machine Learning Approach GuoDong Zhou 1,*, Jie Zhang 1,2, Jian Su 1, Dan Shen 1,2 and ChewLim Tan 2 1 Institute.

Presenter: Shanshan Lu 03/04/2010

CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov

Semi-supervised Training of Statistical Parsers CMSC Natural Language Processing January 26, 2006.

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.

A Novel Local Patch Framework for Fixing Supervised Learning Models Yilei Wang 1, Bingzheng Wei 2, Jun Yan 2, Yang Hu 2, Zhi-Hong Deng 1, Zheng Chen 2.

Prototype-Driven Learning for Sequence Models Aria Haghighi and Dan Klein University of California Berkeley Slides prepared by Andrew Carlson for the Semi-

Paired Sampling in Density-Sensitive Active Learning Pinar Donmez joint work with Jaime G. Carbonell Language Technologies Institute School of Computer.

CSKGOI'08 Commonsense Knowledge and Goal Oriented Interfaces.

An Iterative Approach to Extract Dictionaries from Wikipedia for Under-resourced Languages G. Rohit Bharadwaj Niket Tandon Vasudeva Varma Search and Information.

Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.

Automatic Grammar Induction and Parsing Free Text - Eric Brill Thur. POSTECH Dept. of Computer Science 심 준 혁.

UWMS Data Mining Workshop Content Analysis: Automated Summarizing Prof. Marti Hearst SIMS 202, Lecture 16.

A New Approach for English- Chinese Named Entity Alignment Donghui Feng Yayuan Lv Ming Zhou USC MSR Asia EMNLP-04.

Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart

Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)

Spoken Language Group Chinese Information Processing Lab. Institute of Information Science Academia Sinica, Taipei, Taiwan

Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.

Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.

Learning to Generate Complex Morphology for Machine Translation Einat Minkov †, Kristina Toutanova* and Hisami Suzuki* *Microsoft Research † Carnegie Mellon.

Language Identification and Part-of-Speech Tagging

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

David Mareček and Zdeněk Žabokrtský

Boosting Nearest-Neighbor Classifier for Character Recognition

Morphological Segmentation Inside-Out

mEEC: A Novel Error Estimation Code with Multi-Dimensional Feature

Physics-guided machine learning for milling stability:

Reuben Feinman Research advised by Brenden Lake

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Presentation transcript:

The University of Wisconsin-Madison Universal Morphological Analysis using Structured Nearest Neighbor Prediction Young-Bum Kim, João V. Graça, and Benjamin Snyder University of Wisconsin-Madison 28 July, 2011

The University of Wisconsin-Madison Unsupervised NLP  Unsupervised learning in NLP has become popular 27 papers in this year ACL+EMNLP  Relies on inductive bias, encoded in model structure or learning algorithm. Example : HMM for POS induction, encodes transitional regularity ? ? ? ? I liketoread 1

The University of Wisconsin-Madison Inductive Biases  Formulated with weak empirical grounding (or left implicit)  Single, simple bias for all languages low performance, complicated models, fragility, language dependence. Our approach : learn complex, universal bias using labeled languages 2 i.e. Empirically learn what the space of plausible human languages looks like to guide unsupervised learning

The University of Wisconsin-Madison Key Idea 1)Collect labeled corpora (non-parallel) for several training languages Training languages 3 Test language

The University of Wisconsin-Madison 2) Map each (x,y) pair into a “universal feature space” - i.e. to allow cross-lingual generalization Training languages 4 Test language Key Idea

The University of Wisconsin-Madison score (·) 3) Train scoring function over universal feature space - i.e. treat each annotated language as single data point in structured prediction problem Training languages 5 Test language Key idea

The University of Wisconsin-Madison score (·) argmax y 4) Predict test labels which yield highest score Training languages 6 Test language score ( ) Key Idea

The University of Wisconsin-Madison Test Case: Nominal Morphology  Languages differ in morphological complexity - Only 4 English noun tags in Penn Treebank noun tags in Hungarian corpus (suffix encode case, number, and gender)  Our analysis will break each noun into : stem, phonological deletion rule, and suffix - utiskom [ stem = utisak, del = (..ak# →..k#), suffix = om ] Question : Can we use morphologically annotated languages to train a universal morphological analyzer ? 7

The University of Wisconsin-Madison Our Method  Universal feature space (8 features) - Size of stem, suffix, and deletion rule lexicons - Entropy of stem, suffix, and deletion rule distributions - Percentage of suffix-free words, and words with phonological deletions.  Learning algorithm - Broad characteristics of morphology often similar across select language pairs - Motivates a nearest neighbor approach - In structured scenario, learning becomes a search problem over label space 8

The University of Wisconsin-Madison Structured Nearest Neighbor  Main Idea: predict analysis for test language which brings us closest in feature space to a training language. 1) Initialize analysis of test language: 2) For each training language : - iteratively and greedily update test language analysis to bring closer in feature space to 3)After T iterations, choose training language closest in feature space: 4)Predict the associated analysis: 9 TrainingTest

The University of Wisconsin-Madison Structured Nearest Neighbor 10 Training languages: Initialize test language labels:

The University of Wisconsin-Madison Structured Nearest Neighbor 11 Iterative Search:

The University of Wisconsin-Madison Structured Nearest Neighbor 12 Iterative Search:

The University of Wisconsin-Madison Structured Nearest Neighbor 13 Iterative Search:

The University of Wisconsin-Madison Structured Nearest Neighbor 14 Predict:

The University of Wisconsin-Madison Morphology Search Algorithm 15 Initialization Reanalyze Each Word Find New Stems Find New Suffixes Based on (Goldsmith 2005) - He minimizes description length - We minimize distance to training language Training Candidates Select Stage 0: Stage 1: Stage 2: Stage 3:

The University of Wisconsin-Madison Iterative Search Algorithm 16  Stage 0 : Using “character successor frequency,” initialize sets T, F, and D. Stem Set TSuffix Set FDeletion rule Set F

The University of Wisconsin-Madison Iterative Search Algorithm 17  Stage 1 : - greedily reanalyze each word, keeping T and F fixed. Stem Set TSuffix Set FDeletion rule Set F

The University of Wisconsin-Madison Iterative Search Algorithm 18  Stage 2 : - greedily analyze unsegmented words, keeping F fixed Stem Set TSuffix Set FDeletion rule Set F

The University of Wisconsin-Madison Iterative Search Algorithm 19  Stage 3 : Find new Suffixes - greedily analyze unsegmented words, keeping T fixed Stem Set TSuffix Set FDeletion rule Set F

The University of Wisconsin-Madison Experimental Setup  Corpus: Orwell’s Nineteen Eighty Four (Multext East V3) - Languages: Bulgarian, Czech, English, Estonian, Hungarian, Romanian, Slovene, Serbian - 94,725 tokens (English). Slight confound: data is parallel. Method does not assume or exploit this fact. - all words tagged with morpho-syntactic analysis.  Baseline: Linguistica model (Goldsmith 2005) - same search procedure, greedily minimizes description length  Upper bound: supervised model - structured perceptron framework (Collins 2002) 20

The University of Wisconsin-Madison Aggregate Results 21  Accuracy: fraction of word types with correct analysis 64.6

The University of Wisconsin-Madison Aggregate Results 22  Accuracy: fraction of word types with correct analysis Supervised

The University of Wisconsin-Madison Aggregate Results 23  Accuracy: fraction of word types with correct analysis  Our Model: Train with 7, test on 1 -average absolute increase of reduces error by 42% Supervised

The University of Wisconsin-Madison Aggregate Results 24  Accuracy: fraction of word types with correct analysis  Our Model: Train with 7, test on 1 -average absolute increase of reduces error by 42%  Oracle: Each language guided using own gold standard feature values Accuracy still below supervised: (1) search errors (2) coarseness of feature space Supervised Oracle

The University of Wisconsin-Madison Results By Language 25  Best accuracy: English  Lowest accuracy: Estonian Linguistica

The University of Wisconsin-Madison Results By Language  Biggest improvements for Serbian (15 points) and Slovene (22 points).  For all languages other than English, improvement over baseline Our Model (train with 7, test on 1)

The University of Wisconsin-Madison Visualization of Feature Space 27  Feature space reduced to 2D using MDS Linguistica Gold Standard Our Method

The University of Wisconsin-Madison Visualization of Feature Space 28  Serbian and Slovene: -Closely related Slavic languages -Nearest Neighbors under our model’s analysis -Essentially they “swap places” Linguistica Gold Standard Our Method

The University of Wisconsin-Madison Visualization of Feature Space 29  Estonian and Hungarian: - Highly inflected Uralic Languages - They “swap places” Linguistica Gold Standard Our Method

The University of Wisconsin-Madison Visualization of Feature Space 30  English: - Failed to find a good neighbor - Pulled towards Bulgarian (second least inflected language in dataset) Linguistica Gold Standard Our Method

The University of Wisconsin-Madison Accuracy as Training Languages Added 31  Averaged over all language combinations of various sizes - Accuracy climbs as training languages added - Worse than baseline when only one training language available - Better than baseline when two or more training languages available

The University of Wisconsin-Madison Why does accuracy improve with more languages?  Resulting distance VS accuracy for all 56 train-test pairs - More training languages ⇒ find a closer neighbor - Closer neighbor ⇒ higher accuracy 32

The University of Wisconsin-Madison Summary 33 Main Idea : Recast unsupervised learning as cross-lingual structured prediction Test case : morphological analysis of 8 languages.  Formulated universal feature space for morphology  Developed novel structured nearest neighbor approach  Our method yields substantial accuracy gains

The University of Wisconsin-Madison Future Work 34  Shortcoming - uniform weighting of dimensions in the the universal feature space - some features may be more important than others  Future work: learn distance metric on universal feature space

The University of Wisconsin-Madison Thank You 35