Presentation is loading. Please wait.

Presentation is loading. Please wait.

Inductive Dependency Parsing Joakim Nivre

Similar presentations


Presentation on theme: "Inductive Dependency Parsing Joakim Nivre"— Presentation transcript:

1 Inductive Dependency Parsing Joakim Nivre
Machine Learning 2 Inductive Dependency Parsing Joakim Nivre Uppsala University Växjö University Department of Linguistics and Philology School of Mathematics and Systems Engineering

2 Inductive Dependency Parsing
Dependency-based representations … have restricted expressivity but provide a transparent encoding of semantic structure. have restricted complexity in parsing. Inductive machine learning … is necessary for accurate disambiguation. is beneficial for robustness. makes (formal) grammars superfluous.

3 Dependency Graph 1 2 3 4 5 6 7 8 9 Economic news had little effect on
ROOT OBJ PMOD NMOD SBJ NMOD NMOD NMOD 1 2 3 4 5 6 7 8 9 Economic news had little effect on financial markets . JJ NN VBD IN NNS

4 Key Ideas Deterministic: History-based: Discriminative:
Deterministic algorithms for building dependency graphs (Kudo and Matsumoto 2002, Yamada and Matsumoto 2003, Nivre 2003) History-based: History-based models for predicting the next parser action (Black et al. 1992, Magerman 1995, Ratnaparkhi 1997, Collins 1997) Discriminative: Discriminative machine learning to map histories to actions (Veenstra and Daelemans 2000, Kudo and Matsumoto 2002, Yamada and Matsumoto 2003, Nivre et al. 2004)

5 Guided Parsing Deterministic parsing: Guided deterministic parsing:
Greedy algorithm for disambiguation Optimal strategy given an oracle Guided deterministic parsing: Guide = Approximation of oracle Desiderata: High prediction accuracy Efficient implementation (constant time) Solution: Discriminative classifier induced from treebank data

6 Learning Classification problem (S  T) Training data:
Parser states: S = { s | s = (1, …, p) } Parser actions: T = { t1, …, tm } Training data: D = { (si-1, ti) | ti(si-1) = si in gold standard derivation s1, …, sn } Learning methods: Memory-based learning Support vector machines Maximum entropy modeling

7 Feature Models Model P: PoS: t1, top, next, n1, n2
hd ld rd . th next top n1 n2 n3 t1 Model P: PoS: t1, top, next, n1, n2 Model D: P + DepTypes: t.hd, t.ld, t.rd, n.ld Model L2: D + Words: top, next Model L4: L2 + Words: top.hd, n1 Stack Input

8 Experimental Results (MBL)
– Dependency features help – Lexicalisation helps … – … up to a point (?) Swedish English AS EM U L P 77.4 70.1 26.6 17.8 79.0 76.1 14.4 10.0 D 82.5 75.1 33.5 22.2 83.4 80.5 21.9 17.0 L2 85.6 81.5 39.1 30.2 86.6 84.8 29.9 26.2 L4 85.9 81.6 39.8 30.4 87.3 31.1 27.7

9 Parameter Optimization
Learning algorithm parameter optimization: Manual (Nivre 2005) vs. paramsearch (van den Bosch 2003) Model = L4 + PoS of n3 Swedish English Parameter Manual Param Number of neighbors (-k) 5 11 7 19 Distance metric (-m) MVDM Switching threshold (-L) 3 2 Feature weighting (-w) None GR Distance weighted class voting (-d) ID IL Unlabeled attachment score (ASU) 86.2 86.0 87.7 86.8 Labeled attachment score (ASL) 81.9 82.0 85.9 84.9

10 Learning Curves Swedish: English: Attachment score (U/L) Models: D, L2
10K tokens/section English: 100K tokens/section

11 Dependency Types: Swedish
High accuracy (84%  labeled F): IM (marker  infinitive) 98.5% PR (preposition  noun) 90.6% UK (complementizer  verb) 86.4% VC (auxiliary verb  main verb) 86.1% DET (noun  determiner) 89.5% ROOT % SUB (verb  subject) 84.5% Medium accuracy (76%  labeled F  80%): ATT (noun modifier) 79.2% CC (coordination) % OBJ (verb  object) 77.7% PRD (verb  predicative) 76.8% ADV (adverbial) % Low accuracy (labeled F  70%): INF, APP, XX, ID

12 Dependency Types: English
High accuracy (86%  labeled F): VC (auxiliary verb  main verb) 95.0% NMOD (noun modifier) 91.0% SBJ (verb  subject) 89.3% PMOD (preposition modifier) 88.6% SBAR (complementizer  verb) 86.1% Medium accuracy (73%  labeled F  83%): ROOT % OBJ (verb  object) 81.1% VMOD (verb modifier) 76.8% AMOD (adjective/adverb modifier) 76.7% PRD (predicative) % Low accuracy (labeled F  70%): DEP (null label)

13 MaltParser Software for inductive dependency parsing: Version 0.3:
Freely available for research and education (http// Version 0.3: Parsing algorithms: Nivre (2003) (arc-eager, arc-standard) Covington (2001) (projective, non-projective) Learning algorithms: MBL (TIMBL) SVM (LIBSVM) Feature models: Arbitrary combinations of part-of-speech features, dependency type features and lexical features Auxiliary tools: MaltEval MaltConverter Proj

14 CoNLL-X Shared Task Language #Tokens #DTypes ASU ASL Japanese 150K 8
92.2 90.3 English* 1000K 12 89.7 88.3 Bulgarian 200K 19 88.0 82.5 Chinese 350K 134 82.2 Swedish 64 87.9 81.3 Danish 100K 53 86.9 82.0 Portuguese 55 86.0 81.5 German 700K 46 85.0 Italian* 40K 17 82.9 75.7 Czech 1250K 82 80.1 72.8 Spanish 90K 21 79.0 74.3 Dutch 26 76.0 71.7 Arabic 50K 27 74.0 61.7 Turkish 60K 73.8 63.0 Slovene 30K 73.3 62.2

15 Possible Projects CoNLL Shared Task: Parsing spoken language:
Work on one or more languages With or without MaltParser Data sets available Parsing spoken language: Talbanken05: Swedish treebank with written and spoken data, cross-training experiments GSLC: 1.2M corpus of spoken Swedish


Download ppt "Inductive Dependency Parsing Joakim Nivre"

Similar presentations


Ads by Google