Presentation is loading. Please wait.

Presentation is loading. Please wait.

The University of Wisconsin-Madison Universal Morphological Analysis using Structured Nearest Neighbor Prediction Young-Bum Kim, João V. Graça, and Benjamin.

Similar presentations


Presentation on theme: "The University of Wisconsin-Madison Universal Morphological Analysis using Structured Nearest Neighbor Prediction Young-Bum Kim, João V. Graça, and Benjamin."— Presentation transcript:

1 The University of Wisconsin-Madison Universal Morphological Analysis using Structured Nearest Neighbor Prediction Young-Bum Kim, João V. Graça, and Benjamin Snyder University of Wisconsin-Madison 28 July, 2011

2 The University of Wisconsin-Madison Unsupervised NLP  Unsupervised learning in NLP has become popular 27 papers in this year ACL+EMNLP  Relies on inductive bias, encoded in model structure or learning algorithm. Example : HMM for POS induction, encodes transitional regularity ? ? ? ? I liketoread 1

3 The University of Wisconsin-Madison Inductive Biases  Formulated with weak empirical grounding (or left implicit)  Single, simple bias for all languages low performance, complicated models, fragility, language dependence. Our approach : learn complex, universal bias using labeled languages 2 i.e. Empirically learn what the space of plausible human languages looks like to guide unsupervised learning

4 The University of Wisconsin-Madison Key Idea 1)Collect labeled corpora (non-parallel) for several training languages Training languages 3 Test language

5 The University of Wisconsin-Madison 2) Map each (x,y) pair into a “universal feature space” - i.e. to allow cross-lingual generalization Training languages 4 Test language Key Idea

6 The University of Wisconsin-Madison score (·) 3) Train scoring function over universal feature space - i.e. treat each annotated language as single data point in structured prediction problem Training languages 5 Test language Key idea

7 The University of Wisconsin-Madison score (·) argmax y 4) Predict test labels which yield highest score Training languages 6 Test language score ( ) Key Idea

8 The University of Wisconsin-Madison Test Case: Nominal Morphology  Languages differ in morphological complexity - Only 4 English noun tags in Penn Treebank - 154 noun tags in Hungarian corpus (suffix encode case, number, and gender)  Our analysis will break each noun into : stem, phonological deletion rule, and suffix - utiskom [ stem = utisak, del = (..ak# →..k#), suffix = om ] Question : Can we use morphologically annotated languages to train a universal morphological analyzer ? 7

9 The University of Wisconsin-Madison Our Method  Universal feature space (8 features) - Size of stem, suffix, and deletion rule lexicons - Entropy of stem, suffix, and deletion rule distributions - Percentage of suffix-free words, and words with phonological deletions.  Learning algorithm - Broad characteristics of morphology often similar across select language pairs - Motivates a nearest neighbor approach - In structured scenario, learning becomes a search problem over label space 8

10 The University of Wisconsin-Madison Structured Nearest Neighbor  Main Idea: predict analysis for test language which brings us closest in feature space to a training language. 1) Initialize analysis of test language: 2) For each training language : - iteratively and greedily update test language analysis to bring closer in feature space to 3)After T iterations, choose training language closest in feature space: 4)Predict the associated analysis: 9 TrainingTest

11 The University of Wisconsin-Madison Structured Nearest Neighbor 10 Training languages: Initialize test language labels:

12 The University of Wisconsin-Madison Structured Nearest Neighbor 11 Iterative Search:

13 The University of Wisconsin-Madison Structured Nearest Neighbor 12 Iterative Search:

14 The University of Wisconsin-Madison Structured Nearest Neighbor 13 Iterative Search:

15 The University of Wisconsin-Madison Structured Nearest Neighbor 14 Predict:

16 The University of Wisconsin-Madison Morphology Search Algorithm 15 Initialization Reanalyze Each Word Find New Stems Find New Suffixes Based on (Goldsmith 2005) - He minimizes description length - We minimize distance to training language Training Candidates Select Stage 0: Stage 1: Stage 2: Stage 3:

17 The University of Wisconsin-Madison Iterative Search Algorithm 16  Stage 0 : Using “character successor frequency,” initialize sets T, F, and D. Stem Set TSuffix Set FDeletion rule Set F

18 The University of Wisconsin-Madison Iterative Search Algorithm 17  Stage 1 : - greedily reanalyze each word, keeping T and F fixed. Stem Set TSuffix Set FDeletion rule Set F

19 The University of Wisconsin-Madison Iterative Search Algorithm 18  Stage 2 : - greedily analyze unsegmented words, keeping F fixed Stem Set TSuffix Set FDeletion rule Set F

20 The University of Wisconsin-Madison Iterative Search Algorithm 19  Stage 3 : Find new Suffixes - greedily analyze unsegmented words, keeping T fixed Stem Set TSuffix Set FDeletion rule Set F

21 The University of Wisconsin-Madison Experimental Setup  Corpus: Orwell’s Nineteen Eighty Four (Multext East V3) - Languages: Bulgarian, Czech, English, Estonian, Hungarian, Romanian, Slovene, Serbian - 94,725 tokens (English). Slight confound: data is parallel. Method does not assume or exploit this fact. - all words tagged with morpho-syntactic analysis.  Baseline: Linguistica model (Goldsmith 2005) - same search procedure, greedily minimizes description length  Upper bound: supervised model - structured perceptron framework (Collins 2002) 20

22 The University of Wisconsin-Madison Aggregate Results 21  Accuracy: fraction of word types with correct analysis 64.6

23 The University of Wisconsin-Madison Aggregate Results 22  Accuracy: fraction of word types with correct analysis 64.6 92.8 Supervised

24 The University of Wisconsin-Madison Aggregate Results 23  Accuracy: fraction of word types with correct analysis  Our Model: Train with 7, test on 1 -average absolute increase of 11.8 -reduces error by 42% 76.464.6 92.8 Supervised

25 The University of Wisconsin-Madison Aggregate Results 24  Accuracy: fraction of word types with correct analysis  Our Model: Train with 7, test on 1 -average absolute increase of 11.8 -reduces error by 42%  Oracle: Each language guided using own gold standard feature values Accuracy still below supervised: (1) search errors (2) coarseness of feature space 76.464.6 81.1 92.8 Supervised Oracle

26 The University of Wisconsin-Madison Results By Language 25  Best accuracy: English  Lowest accuracy: Estonian Linguistica 6164696051816566

27 The University of Wisconsin-Madison Results By Language 84837667698361646960517981657166 26  Biggest improvements for Serbian (15 points) and Slovene (22 points).  For all languages other than English, improvement over baseline Our Model (train with 7, test on 1)

28 The University of Wisconsin-Madison Visualization of Feature Space 27  Feature space reduced to 2D using MDS Linguistica Gold Standard Our Method

29 The University of Wisconsin-Madison Visualization of Feature Space 28  Serbian and Slovene: -Closely related Slavic languages -Nearest Neighbors under our model’s analysis -Essentially they “swap places” Linguistica Gold Standard Our Method

30 The University of Wisconsin-Madison Visualization of Feature Space 29  Estonian and Hungarian: - Highly inflected Uralic Languages - They “swap places” Linguistica Gold Standard Our Method

31 The University of Wisconsin-Madison Visualization of Feature Space 30  English: - Failed to find a good neighbor - Pulled towards Bulgarian (second least inflected language in dataset) Linguistica Gold Standard Our Method

32 The University of Wisconsin-Madison Accuracy as Training Languages Added 31  Averaged over all language combinations of various sizes - Accuracy climbs as training languages added - Worse than baseline when only one training language available - Better than baseline when two or more training languages available

33 The University of Wisconsin-Madison Why does accuracy improve with more languages?  Resulting distance VS accuracy for all 56 train-test pairs - More training languages ⇒ find a closer neighbor - Closer neighbor ⇒ higher accuracy 32

34 The University of Wisconsin-Madison Summary 33 Main Idea : Recast unsupervised learning as cross-lingual structured prediction Test case : morphological analysis of 8 languages.  Formulated universal feature space for morphology  Developed novel structured nearest neighbor approach  Our method yields substantial accuracy gains

35 The University of Wisconsin-Madison Future Work 34  Shortcoming - uniform weighting of dimensions in the the universal feature space - some features may be more important than others  Future work: learn distance metric on universal feature space

36 The University of Wisconsin-Madison Thank You 35


Download ppt "The University of Wisconsin-Madison Universal Morphological Analysis using Structured Nearest Neighbor Prediction Young-Bum Kim, João V. Graça, and Benjamin."

Similar presentations


Ads by Google