Presentation is loading. Please wait.

Presentation is loading. Please wait.

Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance July 27 EMNLP 2011 Shay B. Cohen Dipanjan Das Noah A. Smith Carnegie Mellon University.

Similar presentations


Presentation on theme: "Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance July 27 EMNLP 2011 Shay B. Cohen Dipanjan Das Noah A. Smith Carnegie Mellon University."— Presentation transcript:

1 Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance July 27 EMNLP 2011 Shay B. Cohen Dipanjan Das Noah A. Smith Carnegie Mellon University

2 Goal: 2 Learn linguistic structure for a language without any labeled data in that language Part-of-Speech Tagging DET NOUN VERB ADJ ADP. The Skibo Castle is close by. Dependency Parsing EMNLP 2011Cohen, Das and Smith (2011)

3 This work! Multilingual Unsupervised Learning 3 using parallel data no parallel data (hard) supervision in source language(s) joint learning for multiple languages Snyder et al. (2009) Naseem et al. (2010) supervision in source language(s) Smith and Eisner (2009) Das and Petrov (2011) McDonald et al. (2011) joint learning for multiple languages Cohen and Smith (2009) Berg-Kirkpatrick and Klein (2010) EMNLP 2011Cohen, Das and Smith (2011) Yarowsky and Ngai (2001) Xi and Hwa (2005)

4 Annotated data In a Nutshell 4 Unlabeled data in Portuguese + = SpanishItalian Coarse, universal parameters Interpolation (unsupervised training) coarse parameters of Portuguese Monolingual unsupervised training in Portuguese Coarse-to-fine expansion and initialization Cohen, Das and Smith (2011) Portuguese parameters EMNLP 2011

5 5 Assumptions for a given problem: 1. Underlying model is generative HMM TheSkibo is closeby Castle Merialdo (1994) EMNLP 2011Cohen, Das and Smith (2011)

6 6 6 1. Underlying model is generative DET NOUN VERBADJ ADP ROOT DMV Klein and Manning (2004) Assumptions for a given problem: EMNLP 2011Cohen, Das and Smith (2011)

7 7 7 Composed of multinomial distributions HMM TheSkibo is closeby Castle Merialdo (1994) Assumptions for a given problem: 1. Underlying model is generative EMNLP 2011Cohen, Das and Smith (2011)

8 8 8 DET NOUN VERBADJ ADP ROOT DMV Klein and Manning (2004) Composed of multinomial distributions Assumptions for a given problem: 1. Underlying model is generative EMNLP 2011Cohen, Das and Smith (2011)

9 9 9 In general, unlexicalized parameters look like: k th multinomial in the model i th event in the multinomial Assumptions for a given problem: 1. Underlying model is generative e.g. transition from ADJ ( ) to NOUN ( ) EMNLP 2011Cohen, Das and Smith (2011)

10 10 The lexicalized parameters take a similar form (No lexicalized parameters for the DMV) Assumptions for a given problem: 1. Underlying model is generative EMNLP 2011Cohen, Das and Smith (2011)

11 11 unlexicalizedlexicalized number of times event i of multinomial k fires in the derivation Assumptions for a given problem: 1. Underlying model is generative EMNLP 2011Cohen, Das and Smith (2011)

12 12 2. Coarse, universal part-of-speech tags VERBDET NOUNCONJ PRONNUM ADJPRT ADV. ADPX Assumptions for a given problem: EMNLP 2011Cohen, Das and Smith (2011)

13 13 Assumptions for a given problem: 2. Coarse, universal part-of-speech tags VERBDET NOUNCONJ PRONNUM ADJPRT ADV. ADPX Treebank tagset For each language, there is a mapping EMNLP 2011Cohen, Das and Smith (2011)

14 Coarse treebank coarse conversion 3. helper languages 14 Assumptions for a given problem: Treebank unlexicalized parameters MLE EMNLP 2011Cohen, Das and Smith (2011) For each:

15 15 Multilingual Modeling EMNLP 2011Cohen, Das and Smith (2011)

16 16 Multilingual Modeling For a target language, unlexicalized parameters: k th multinomial in the model (say, the transitions from the ADJ tag in an HMM) mixture weight for k th multinomial for the th helper language EMNLP 2011Cohen, Das and Smith (2011)

17 ADJ →. 0.70.3 NOUN VERB ADJ ADV PRON DET ADP NUM CONJ PRT. X NOUN VERB ADJ ADV PRON DET ADP NUM CONJ PRT. X 17 Multilingual Modeling e.g., two helper languages: Spanish and Italian NOUN VERB ADJ ADV PRON DET ADP NUM CONJ PRT. X 0.26 0.12 0.03 0.04 0.05 0.03 0.25 0.01 0.13 0.04 0.01 0.04 0.27 0.12 0.03 0.04 0.03 0.25 0.01 0.10 0.05 0.01 0.05 0.25 0.11 0.04 0.06 0.04 0.26 0.0 0.20 0.0 0.00 EMNLP 2011Cohen, Das and Smith (2011)

18 ?? NOUN VERB ADJ ADV PRON DET ADP NUM CONJ PRT. X NOUN VERB ADJ ADV PRON DET ADP NUM CONJ PRT. X 18 Multilingual Modeling e.g., two helper languages: Spanish and Italian 0.27 0.12 0.03 0.04 0.03 0.25 0.01 0.10 0.05 0.01 0.05 0.25 0.11 0.04 0.06 0.04 0.26 0.0 0.20 0.0 0.00 unknown ADJ →. EMNLP 2011Cohen, Das and Smith (2011)

19 19 Learning and Inference EMNLP 2011Cohen, Das and Smith (2011)

20 20 Learning and Inference normal learning EMNLP 2011Cohen, Das and Smith (2011)

21 21 Learning and Inference multilingual learning are fixed! EMNLP 2011Cohen, Das and Smith (2011)

22 22 Learning and Inference Multilingual learning learning with EM: Number of times is used in a derivation M-step: EMNLP 2011Cohen, Das and Smith (2011)

23 23 Learning and Inference Multilingual learning What about feature-rich generative models? Berg-Kirkpatrick et al. (2010) Locally normalized log-linear model EMNLP 2011Cohen, Das and Smith (2011)

24 ?? NOUN VERB ADJ ADV PRON DET ADP NUM CONJ PRT. X NOUN VERB ADJ ADV PRON DET ADP NUM CONJ PRT. X 24 Multilingual Modeling e.g., two helper languages: Spanish and Italian 0.27 0.12 0.03 0.04 0.03 0.25 0.01 0.10 0.05 0.01 0.05 0.25 0.11 0.04 0.06 0.04 0.26 0.0 0.20 0.0 0.00 unknown ADJ →. EMNLP 2011Cohen, Das and Smith (2011)

25 0.62370.3763 NOUN VERB ADJ ADV PRON DET ADP NUM CONJ PRT. X NOUN VERB ADJ ADV PRON DET ADP NUM CONJ PRT. X 25 Multilingual Modeling e.g., two helper languages: Spanish and Italian 0.27 0.12 0.03 0.04 0.03 0.25 0.01 0.10 0.05 0.01 0.05 0.25 0.11 0.04 0.06 0.04 0.26 0.0 0.20 0.0 0.00 learned ADJ →. NOUN VERB ADJ ADV PRON DET ADP NUM CONJ PRT. X 0.26 0.12 0.03 0.04 0.05 0.03 0.25 0.01 0.13 0.04 0.01 0.04 EMNLP 2011Cohen, Das and Smith (2011)

26 JJS →. JJ →. JJR →. 26 Coarse-to-fine expansion NOUN VERB ADJ ADV PRON DET ADP NUM CONJ PRT. X 0.26 0.12 0.03 0.04 0.05 0.03 0.25 0.01 0.13 0.04 0.01 0.04 Learning and Inference (for English) NOUN VERB ADJ ADV PRON DET ADP NUM CONJ PRT. X 0.26 0.12 0.03 0.04 0.05 0.03 0.25 0.01 0.13 0.04 0.01 0.04 NOUN VERB ADJ ADV PRON DET ADP NUM CONJ PRT. X 0.26 0.12 0.03 0.04 0.05 0.03 0.25 0.01 0.13 0.04 0.01 0.04 NOUN VERB ADJ ADV PRON DET ADP NUM CONJ PRT. X 0.26 0.12 0.03 0.04 0.05 0.03 0.25 0.01 0.13 0.04 0.01 0.04 identical copies Step 1 ADJ →. EMNLP 2011Cohen, Das and Smith (2011)

27 27 Coarse-to-fine expansion Learning and Inference (for English) NOUN VERB ADJ ADV PRON DET ADP NUM CONJ PRT. X 0.26 0.12 0.03 0.04 0.05 0.03 0.25 0.01 0.13 0.04 0.01 0.04 JJ →. EMNLP 2011Cohen, Das and Smith (2011)

28 28 NOUN VERB ADJ ADV PRON DET ADP NUM CONJ PRT. X 0.26 0.12 0.03 0.04 0.05 0.03 0.25 0.01 0.13 0.04 0.01 0.04 JJ →. Coarse-to-fine expansion Learning and Inference (for English) VB VBD VBG VBN VBP VBZ 0.065 0.02 NN NNS NNP NNPS........................ Equal division Monolingual unsupervised training Initializer new, fine JJ →. Step 2 EMNLP 2011Cohen, Das and Smith (2011)

29 29 Experiments EMNLP 2011Cohen, Das and Smith (2011)

30 30 Two Problems Unsupervised Part-of-Speech Tagging Model: feature-based HMM (Berg-Kirkpatrick et al., 2010) Learning: L-BFGS Unsupervised Dependency Parsing Model: DMV (Klein and Manning, 2004) Learning: EM EMNLP 2011Cohen, Das and Smith (2011)

31 31 Languages Target Languages: Bulgarian, Danish, Dutch, Greek, Japanese, Portuguese, Slovene, Spanish, Swedish, and Turkish Helper Languages: English, German, Italian and Czech (CoNLL Treebanks from 2006 and 2007) EMNLP 2011Cohen, Das and Smith (2011)

32 Direct Gradient (DG) Uniform + DG Mixture + DG Number of Languages with Best Results Average Accuracy 32 Results: POS Tagging (without tag dictionary) EMNLP 2011Cohen, Das and Smith (2011) Monolingual baseline (Berg-Kirkpatrick et al., 2010) Uniform mixture parameters (no learning) Full model

33 Direct Gradient (DG) Uniform + DG Mixture + DG Number of Languages with Best Results 2 (Portuguese, Danish) 2 (Turkish, Bulgarian) 6 Average Accuracy 40.641.043.3 33 Results: POS Tagging (without tag dictionary) EMNLP 2011Cohen, Das and Smith (2011)

34 EMPRPGI Number of Languages with Best Results Average Accuracy 34 Results: Dependency Parsing EMNLP 2011Cohen, Das and Smith (2011) Monolingual EM (Klein and Manning, 2004) Posterior Regularization (Gillenwater et al, 2010) Phylogenetic Grammar Induction (Berg-Kirkpatrick and Klein, 2010)

35 EMPRPGI Number of Languages with Best Results Average Accuracy UniformMixtureUniform + EM Mixture + EM 1. Uniform mixture parameters 2. No coarse-to-fine expansion (no learning) 35 Results: Dependency Parsing EMNLP 2011Cohen, Das and Smith (2011) 1.Learned mixture parameters 2.No coarse-to-fine expansion 1.Uniform mixture parameters 2.Coarse-to-fine expansion → monolingual learning 1.Learned mixture parameters 2.Coarse-to-fine expansion → monolingual learning

36 UniformMixtureUniform + EM Mixture + EM PRPGI Number of Languages with Best Results 02 (Turkish, Slovene) 0 Average Accuracy 41.450.2 * 53.6* 36 Results: Dependency Parsing EMNLP 2011Cohen, Das and Smith (2011)

37 UniformMixtureUniform + EM Mixture + EM 3 (Bulgarian, Swedish, Dutch) 1 (Danish) 1 (Greek) 3 (Portuguese, Japanese, Spanish) 61.662.261.562.1 EMPRPGI Number of Languages with Best Results 02 (Turkish, Slovene) 0 Average Accuracy 41.450.2 * 53.6* 37 Results: Dependency Parsing EMNLP 2011Cohen, Das and Smith (2011)

38 EMNLP 201138 Analyzing with Principal Component Analysis Two principal components

39 39 From Words to Dependencies EMNLP 2011Cohen, Das and Smith (2011)

40 40 From Words to Dependencies Use induced tags to induce dependencies 1.In a pipeline 2.Using the posteriors over tags in a sausage lattice (Cohen and Smith, 2007) EMNLP 2011Cohen, Das and Smith (2011)

41 EMNLP 201141 From Words to Dependencies Joint Decoding: 1 23 4 The Skibo Castle DET : 0.95 ADJ: 0.03 NOUN: 0.02 DET : 0.0 ADJ: 0.3 NOUN: 0.7 DET : 0.01 ADJ: 0.1 NOUN: 0.89 DMV Parsing a lattice

42 42 Results: Words to Dependencies PipelineJoint DGMixture + DG Mixture + DG Number of Languages with Best Results Average EMNLP 2011Cohen, Das and Smith (2011)

43 43 Results: Words to Dependencies PipelineJoint DGMixture + DG Mixture + DG Number of Languages with Best Results 1 (Greek) 0 5 (Portuguese, Turkish, Swedish, Slovebe, Danish) 4 (Bulgarian, Japanese, Spanish, Dutch) Average 56.954.057.955.6 EMNLP 2011Cohen, Das and Smith (2011)

44 44 Results: Words to Dependencies PipelineJoint DGMixture + DG Mixture + DG Number of Languages with Best Results 1 (Greek) 0 5 (Portuguese, Turkish, Swedish, Slovebe, Danish) 4 (Bulgarian, Japanese, Spanish, Dutch) Average 56.954.057.955.6 EMNLP 2011Cohen, Das and Smith (2011) Best average result with gold tags: 62.2 Interesting result: Auto tags perform better for Turkish and Slovene

45 45 Conclusions EMNLP 2011Cohen, Das and Smith (2011)

46 46 Conclusions Improvements for two major tasks using non- parallel multilingual guidance In general grammar induction results better than POS tagging Joint POS and dependency parsing performs surprisingly well For a few languages, results are better than using gold tags Joint decoding performs better than a pipeline EMNLP 2011Cohen, Das and Smith (2011)

47 47 Questions? EMNLP 2011Cohen, Das and Smith (2011)

48 48 Results: POS Tagging Direct Gradient (DG) Uniform + DG Mixture + DG Bulgarian34.738.035.8 Danish48.836.239.9 Dutch45.443.750.2 Greek35.336.738.9 Japanese52.360.461.7 Portuguese53.545.751.5 Slovene33.435.936.0 Spanish40.031.840.5 Swedish34.437.739.9 Turkish27.943.638.6 Average40.641.043.3 (without tag dictionary) EMNLP 2011Cohen, Das and Smith (2011)

49 49 Results: POS Tagging Direct Gradient (DG) Uniform + DG Mixture + DG Bulgarian34.738.035.8 Danish48.836.239.9 Dutch45.443.750.2 Greek35.336.738.9 Japanese52.360.461.7 Portuguese53.545.751.5 Slovene33.435.936.0 Spanish40.031.840.5 Swedish34.437.739.9 Turkish27.943.638.6 Average40.641.043.3 (without tag dictionary) EMNLP 2011Cohen, Das and Smith (2011)

50 50 Results: POS Tagging Direct Gradient (DG) Uniform + DG Mixture + DG Bulgarian80.781.382.6 Danish82.382.0 Dutch79.279.380.0 Greek88.080.3 Japanese83.477.979.9 Portuguese75.483.884.7 Slovene75.682.8 Spanish82.3 83.3 Swedish61.569.067.0 Turkish50.4 Average75.976.977.3 (with tag dictionary) EMNLP 2011Cohen, Das and Smith (2011)

51 51 Results: POS Tagging Direct Gradient (DG) Uniform + DG Mixture + DG Bulgarian80.781.382.6 Danish82.382.0 Dutch79.279.380.0 Greek88.080.3 Japanese83.477.979.9 Portuguese75.483.884.7 Slovene75.682.8 Spanish82.3 83.3 Swedish61.569.067.0 Turkish50.4 50.0 Average75.976.977.3 (with tag dictionary) EMNLP 2011Cohen, Das and Smith (2011)

52 UniformMixtureUniform + EMMixture + EM 75.675.574.772.8 59.259.951.355.2 50.751.145.946.0 57.059.573.072.3 56.358.359.863.9 78.676.878.779.8 46.146.041.341.0 73.275.975.576.7 74.073.270.568.7 45.045.343.944.1 61.662.261.562.1 EMPRPGI Bulgarian54.354.0- Danish41.444.041.6 Dutch38.637.945.1 Greek41.0-- Japanese43.060.2- Portuguese42.547.863.1 Slovene37.050.349.6 Spanish38.162.463.8 Swedish42.342.258.3 Turkish36.353.4- Average41.4-- 52 Results: Dependency Parsing EMNLP 2011Cohen, Das and Smith (2011)

53 UniformMixtureUniform + EMMixture + EM 75.675.574.772.8 59.259.951.355.2 50.751.145.946.0 57.059.573.072.3 56.358.359.863.9 78.676.878.779.8 46.146.041.341.0 73.275.975.576.7 74.073.270.568.7 45.045.343.944.1 61.662.261.562.1 EMPRPGI Bulgarian54.354.0- Danish41.444.041.6 Dutch38.637.945.1 Greek41.0-- Japanese43.060.2- Portuguese42.547.863.1 Slovene37.050.349.6 Spanish38.162.463.8 Swedish42.342.258.3 Turkish36.353.4- Average41.4-- 53 Results: Dependency Parsing EMNLP 2011Cohen, Das and Smith (2011)

54 54 Results: Words to Dependencies JointPipelineGold Tags DGMixture + DGDGMixture + DG Bulgarian62.467.057.762.975.6 Danish50.450.148.948.359.9 Dutch48.352.249.951.250.7 Greek63.552.268.250.073.0 Japanese61.469.564.268.663.9 Portuguese68.462.260.059.879.8 Slovene47.236.845.836.446.1 Spanish67.769.365.868.176.7 Swedish58.249.157.947.674.0 Turkish52.447.450.847.145.3 Average57.955.056.954.064.5 EMNLP 2011Cohen, Das and Smith (2011)


Download ppt "Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance July 27 EMNLP 2011 Shay B. Cohen Dipanjan Das Noah A. Smith Carnegie Mellon University."

Similar presentations


Ads by Google