Inductive Dependency Parsing Joakim Nivre

Slides:



Advertisements
Similar presentations
A Support Vector Method for Optimizing Average Precision
Advertisements

CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
HPSG parser development at U-tokyo Takuya Matsuzaki University of Tokyo.
A Structured Model for Joint Learning of Argument Roles and Predicate Senses Yotaro Watanabe Masayuki Asahara Yuji Matsumoto ACL 2010 Uppsala, Sweden July.
Structured SVM Chen-Tse Tsai and Siddharth Gupta.
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
Dependency Parsing Joakim Nivre. Dependency Grammar Old tradition in descriptive grammar Modern theroretical developments: –Structural syntax (Tesnière)
Dependency Parsing Some slides are based on:
Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance July 27 EMNLP 2011 Shay B. Cohen Dipanjan Das Noah A. Smith Carnegie Mellon University.
Recognizing Implicit Discourse Relations in the Penn Discourse Treebank Ziheng Lin, Min-Yen Kan, and Hwee Tou Ng Department of Computer Science National.
Annotating language data Tomaž Erjavec Institut für Informationsverarbeitung Geisteswissenschaftliche Fakultät Karl-Franzens-Universität Graz Tomaž Erjavec.
The University of Wisconsin-Madison Universal Morphological Analysis using Structured Nearest Neighbor Prediction Young-Bum Kim, João V. Graça, and Benjamin.
Normalized alignment of dependency trees for detecting textual entailment Erwin Marsi & Emiel Krahmer Tilburg University Wauter Bosma & Mariët Theune University.
Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011.
Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.
PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.
CS4705 Natural Language Processing.  Regular Expressions  Finite State Automata ◦ Determinism v. non-determinism ◦ (Weighted) Finite State Transducers.
Learning Accurate, Compact, and Interpretable Tree Annotation Slav Petrov, Leon Barrett, Romain Thibaux, Dan Klein.
Page 1 Generalized Inference with Multiple Semantic Role Labeling Systems Peter Koomen, Vasin Punyakanok, Dan Roth, (Scott) Wen-tau Yih Department of Computer.
Boosting Applied to Tagging and PP Attachment By Aviad Barzilai.
Växjö University Joakim Nivre Växjö University. 2 Who? Växjö University (800) School of Mathematics and Systems Engineering (120) Computer Science division.
Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Part-of-Speech Tagging and Chunking with Maximum Entropy Model Sandipan Dandapat.
Dependency Parsing with Reference to Slovene, Spanish and Swedish Simon Corston-Oliver Anthony Aue Microsoft Research.
1/17 Acquiring Selectional Preferences from Untagged Text for Prepositional Phrase Attachment Disambiguation Hiram Calvo and Alexander Gelbukh Presented.
Treebanks as Training Data for Parsers Joakim Nivre Växjö University and Uppsala University
Automatic Measurement of Syntactic Development in Child Language Kenji Sagae Language Technologies Institute Student Research Symposium September 2005.
Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.
An SVMs Based Multi-lingual Dependency Parsing Yuchang CHENG, Masayuki ASAHARA and Yuji MATSUMOTO Nara Institute of Science and Technology.
Automatic Measurement of Syntactic Development in Child Language Kenji Sagae Alon Lavie Brian MacWhinney Carnegie Mellon University.
STRUCTURED PERCEPTRON Alice Lai and Shi Zhi. Presentation Outline Introduction to Structured Perceptron ILP-CRF Model Averaged Perceptron Latent Variable.
1 CS546: Machine Learning and Natural Language Preparation to the Term Project: - Dependency Parsing - Dependency Representation for Semantic Role Labeling.
Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning
1 The Hidden Vector State Language Model Vidura Senevitratne, Steve Young Cambridge University Engineering Department.
Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.
Unambiguity Regularization for Unsupervised Learning of Probabilistic Grammars Kewei TuVasant Honavar Departments of Statistics and Computer Science University.
1 Data-Driven Dependency Parsing. 2 Background: Natural Language Parsing Syntactic analysis String to (tree) structure He likes fish S NP VP NP VNPrn.
Corpus-based computational linguistics or computational corpus linguistics? Joakim Nivre Uppsala University Department of Linguistics and Philology.
A Multi-Strategy Approach to Parsing of Grammatical Relations in Child Language Transcripts Kenji Sagae Language Technologies Institute Carnegie Mellon.
1 Semi-Supervised Approaches for Learning to Parse Natural Languages Rebecca Hwa
This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.
Training dependency parsers by jointly optimizing multiple objectives Keith HallRyan McDonaldJason Katz- BrownMichael Ringgaard.
Dependency Parsing Prashanth Mannem
AQUAINT 18-Month Workshop 1 Light Semantic Processing for QA Language Technologies Institute, Carnegie Mellon B. Van Durme, Y. Huang, A. Kupsc and E. Nyberg.
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
 Conversation Level Constraints on Pedophile Detection in Chat Rooms PAN 2012 — Sexual Predator Identification Claudia Peersman, Frederik Vaassen, Vincent.
A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart
Optimizing Local Probability Models for Statistical Parsing Kristina Toutanova, Mark Mitchell, Christopher Manning Computer Science Department Stanford.
Dependency Parser for Swedish Project for EDA171 by Jonas Pålsson Marcus Stamborg.
CSA2050 Introduction to Computational Linguistics Parsing I.
Presenter: Jinhua Du ( 杜金华 ) Xi’an University of Technology 西安理工大学 NLP&CC, Chongqing, Nov , 2013 Discriminative Latent Variable Based Classifier.
NLP. Introduction to NLP The probabilities don’t depend on the specific words –E.g., give someone something (2 arguments) vs. see something (1 argument)
Supertagging CMSC Natural Language Processing January 31, 2006.
Approximation-aware Dependency Parsing by Belief Propagation September 19, 2015 TACL at EMNLP 1 Matt Gormley Mark Dredze Jason Eisner.
Intra-Chunk Dependency Annotation : Expanding Hindi Inter-Chunk Annotated Treebank Prudhvi Kosaraju, Bharat Ram Ambati, Samar Husain Dipti Misra Sharma,
1 Fine-grained and Coarse-grained Word Sense Disambiguation Jinying Chen, Hoa Trang Dang, Martha Palmer August 22, 2003.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
Chapter 6. Classification and Prediction Classification by decision tree induction Bayesian classification Rule-based classification Classification by.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Dependency Parsing Niranjan Balasubramanian March 24 th 2016 Credits: Many slides from: Michael Collins, Mausam, Chris Manning, COLNG 2014 Dependency Parsing.
LING/C SC 581: Advanced Computational Linguistics Lecture Notes Feb 17 th.
Computational lexicology, morphology and syntax
David Mareček and Zdeněk Žabokrtský
Dependency Parsing & Feature-based Parsing
Probabilistic and Lexicalized Parsing
Universal Dependencies
LING/C SC 581: Advanced Computational Linguistics
CS4705 Natural Language Processing
Dependency Grammar & Stanford Dependencies
Parsing Unrestricted Text
Presentation transcript:

Inductive Dependency Parsing Joakim Nivre Machine Learning 2 Inductive Dependency Parsing Joakim Nivre Uppsala University Växjö University Department of Linguistics and Philology School of Mathematics and Systems Engineering

Inductive Dependency Parsing Dependency-based representations … have restricted expressivity but provide a transparent encoding of semantic structure. have restricted complexity in parsing. Inductive machine learning … is necessary for accurate disambiguation. is beneficial for robustness. makes (formal) grammars superfluous.

Dependency Graph 1 2 3 4 5 6 7 8 9 Economic news had little effect on ROOT OBJ PMOD NMOD SBJ NMOD NMOD NMOD 1 2 3 4 5 6 7 8 9 Economic news had little effect on financial markets . JJ NN VBD IN NNS

Key Ideas Deterministic: History-based: Discriminative: Deterministic algorithms for building dependency graphs (Kudo and Matsumoto 2002, Yamada and Matsumoto 2003, Nivre 2003) History-based: History-based models for predicting the next parser action (Black et al. 1992, Magerman 1995, Ratnaparkhi 1997, Collins 1997) Discriminative: Discriminative machine learning to map histories to actions (Veenstra and Daelemans 2000, Kudo and Matsumoto 2002, Yamada and Matsumoto 2003, Nivre et al. 2004)

Guided Parsing Deterministic parsing: Guided deterministic parsing: Greedy algorithm for disambiguation Optimal strategy given an oracle Guided deterministic parsing: Guide = Approximation of oracle Desiderata: High prediction accuracy Efficient implementation (constant time) Solution: Discriminative classifier induced from treebank data

Learning Classification problem (S  T) Training data: Parser states: S = { s | s = (1, …, p) } Parser actions: T = { t1, …, tm } Training data: D = { (si-1, ti) | ti(si-1) = si in gold standard derivation s1, …, sn } Learning methods: Memory-based learning Support vector machines Maximum entropy modeling …

Feature Models Model P: PoS: t1, top, next, n1, n2 hd ld rd . th next top n1 … n2 n3 t1 Model P: PoS: t1, top, next, n1, n2 Model D: P + DepTypes: t.hd, t.ld, t.rd, n.ld Model L2: D + Words: top, next Model L4: L2 + Words: top.hd, n1 Stack Input

Experimental Results (MBL) – Dependency features help – Lexicalisation helps … – … up to a point (?) Swedish English AS EM U L P 77.4 70.1 26.6 17.8 79.0 76.1 14.4 10.0 D 82.5 75.1 33.5 22.2 83.4 80.5 21.9 17.0 L2 85.6 81.5 39.1 30.2 86.6 84.8 29.9 26.2 L4 85.9 81.6 39.8 30.4 87.3 31.1 27.7

Parameter Optimization Learning algorithm parameter optimization: Manual (Nivre 2005) vs. paramsearch (van den Bosch 2003) Model = L4 + PoS of n3 Swedish English Parameter Manual Param Number of neighbors (-k) 5 11 7 19 Distance metric (-m) MVDM Switching threshold (-L) 3 2 Feature weighting (-w) None GR Distance weighted class voting (-d) ID IL Unlabeled attachment score (ASU) 86.2 86.0 87.7 86.8 Labeled attachment score (ASL) 81.9 82.0 85.9 84.9

Learning Curves Swedish: English: Attachment score (U/L) Models: D, L2 10K tokens/section English: 100K tokens/section

Dependency Types: Swedish High accuracy (84%  labeled F): IM (marker  infinitive) 98.5% PR (preposition  noun) 90.6% UK (complementizer  verb) 86.4% VC (auxiliary verb  main verb) 86.1% DET (noun  determiner) 89.5% ROOT 87.8% SUB (verb  subject) 84.5% Medium accuracy (76%  labeled F  80%): ATT (noun modifier) 79.2% CC (coordination) 78.9% OBJ (verb  object) 77.7% PRD (verb  predicative) 76.8% ADV (adverbial) 76.3% Low accuracy (labeled F  70%): INF, APP, XX, ID

Dependency Types: English High accuracy (86%  labeled F): VC (auxiliary verb  main verb) 95.0% NMOD (noun modifier) 91.0% SBJ (verb  subject) 89.3% PMOD (preposition modifier) 88.6% SBAR (complementizer  verb) 86.1% Medium accuracy (73%  labeled F  83%): ROOT 82.4% OBJ (verb  object) 81.1% VMOD (verb modifier) 76.8% AMOD (adjective/adverb modifier) 76.7% PRD (predicative) 73.8% Low accuracy (labeled F  70%): DEP (null label)

MaltParser Software for inductive dependency parsing: Version 0.3: Freely available for research and education (http//www.msi.vxu.se/users/nivre/research/MaltParser.html) Version 0.3: Parsing algorithms: Nivre (2003) (arc-eager, arc-standard) Covington (2001) (projective, non-projective) Learning algorithms: MBL (TIMBL) SVM (LIBSVM) Feature models: Arbitrary combinations of part-of-speech features, dependency type features and lexical features Auxiliary tools: MaltEval MaltConverter Proj

CoNLL-X Shared Task Language #Tokens #DTypes ASU ASL Japanese 150K 8 92.2 90.3 English* 1000K 12 89.7 88.3 Bulgarian 200K 19 88.0 82.5 Chinese 350K 134 82.2 Swedish 64 87.9 81.3 Danish 100K 53 86.9 82.0 Portuguese 55 86.0 81.5 German 700K 46 85.0 Italian* 40K 17 82.9 75.7 Czech 1250K 82 80.1 72.8 Spanish 90K 21 79.0 74.3 Dutch 26 76.0 71.7 Arabic 50K 27 74.0 61.7 Turkish 60K 73.8 63.0 Slovene 30K 73.3 62.2

Possible Projects CoNLL Shared Task: Parsing spoken language: Work on one or more languages With or without MaltParser Data sets available Parsing spoken language: Talbanken05: Swedish treebank with written and spoken data, cross-training experiments GSLC: 1.2M corpus of spoken Swedish