Treebanks as Training Data for Parsers Joakim Nivre Växjö University and Uppsala University

Slides:



Advertisements
Similar presentations
CILC2011 A framework for structured knowledge extraction and representation from natural language via deep sentence analysis Stefania Costantini Niva Florio.
Advertisements

Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
Overview of the Hindi-Urdu Treebank Fei Xia University of Washington 7/23/2011.
Dependency Parsing Some slides are based on:
Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011.
Are Linguists Dinosaurs? 1.Statistical language processors seem to be doing away with the need for linguists. –Why do we need linguists when a machine.
Machine Translation via Dependency Transfer Philip Resnik University of Maryland DoD MURI award in collaboration with JHU: Bootstrapping Out of the Multilingual.
1/13 Parsing III Probabilistic Parsing and Conclusions.
1/17 Probabilistic Parsing … and some other approaches.
Växjö University Joakim Nivre Växjö University. 2 Who? Växjö University (800) School of Mathematics and Systems Engineering (120) Computer Science division.
April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester1 Treebanks and Parsing Jan Hajič Institute of Formal and Applied Linguistics School of.
Yu-Chieh Wu Yue-Shi Lee Jie-Chi Yang National Central University, Taiwan Ming Chuan University, Taiwan Date: 2006/6/8 Reporter: Yu-Chieh Wu The Exploration.
Parsing SLP Chapter 13. 7/2/2015 Speech and Language Processing - Jurafsky and Martin 2 Outline  Parsing with CFGs  Bottom-up, top-down  CKY parsing.
An SVMs Based Multi-lingual Dependency Parsing Yuchang CHENG, Masayuki ASAHARA and Yuji MATSUMOTO Nara Institute of Science and Technology.
Thoughts on Treebanks Christopher Manning Stanford University.
Czech-to-English Translation: MT Marathon 2009 Session Preview Jonathan Clark Greg Hanneman Language Technologies Institute Carnegie Mellon University.
Evidence from Content INST 734 Module 2 Doug Oard.
1 CS546: Machine Learning and Natural Language Preparation to the Term Project: - Dependency Parsing - Dependency Representation for Semantic Role Labeling.
Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.
Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
Parsing. Goals of Parsing Check the input for syntactic accuracy Return appropriate error messages Recover if possible Produce, or at least traverse,
1 Data-Driven Dependency Parsing. 2 Background: Natural Language Parsing Syntactic analysis String to (tree) structure He likes fish S NP VP NP VNPrn.
Corpus-based computational linguistics or computational corpus linguistics? Joakim Nivre Uppsala University Department of Linguistics and Philology.
Parser Adaptation and Projection with Quasi-Synchronous Grammar Features David A. Smith (UMass Amherst) Jason Eisner (Johns Hopkins) 1.
Part D: multilingual dependency parsing. Motivation A difficult syntactic ambiguity in one language may be easy to resolve in another language (bilingual.
Inductive Dependency Parsing Joakim Nivre
Training dependency parsers by jointly optimizing multiple objectives Keith HallRyan McDonaldJason Katz- BrownMichael Ringgaard.
Grammar Engineering: What is it good for? Miriam Butt (University of Konstanz) and Martin Forst (NetBase Solutions) Colombo 2014.
Part E: conclusion and discussions. Topics in this talk Dependency parsing and supervised approaches Single model Graph-based; Transition-based; Easy-first;
Published materials Authentic materials
Coarse-to-Fine Efficient Viterbi Parsing Nathan Bodenstab OGI RPE Presentation May 8, 2006.
Natural Language Programming David Vadas The University of Sydney Supervisor: James Curran.
1 / 5 Zdeněk Žabokrtský: Automatic Functor Assignment in the PDT Automatic Functor Assignment (AFA) in the Prague Dependency Treebank PDT : –a long term.
Holidays and Travel Objective: to analyse a past exam paper.
Close Reading “I don’t understand it, and I don’t like what I don’t understand.” - E. B. White, Charlotte’s Web.
Exceeds EOC Target Intermediate Mid High EOC Target Intermediate Low EOC Target Novice High Near EOC Target Novice Mid/High Below EOC Target Novice Mid.
Exceeds EOC Target Intermediate Low EOC High Target Novice High EOC Target Novice Mid/High Near EOC Target Novice Mid Below EOC Target Novice Low Scoring.
NLP. Introduction to NLP The probabilities don’t depend on the specific words –E.g., give someone something (2 arguments) vs. see something (1 argument)
Supertagging CMSC Natural Language Processing January 31, 2006.
Exploiting Reducibility in Unsupervised Dependency Parsing David Mareček and Zdeněk Žabokrtský Institute of Formal and Applied Linguistics Charles University.
Intra-Chunk Dependency Annotation : Expanding Hindi Inter-Chunk Annotated Treebank Prudhvi Kosaraju, Bharat Ram Ambati, Samar Husain Dipti Misra Sharma,
Building Sub-Corpora Suitable for Extraction of Lexico-Syntactic Information Ondřej Bojar, Institute of Formal and Applied Linguistics, ÚFAL.
CS 4705 Lecture 17 Semantic Analysis: Robust Semantics.
CSCI N201 Programming Concepts and Database 2 - STAIR Lingma Acheson Department of Computer and Information Science, IUPUI.
1 CPA: Where do we go from here? Research Institute for Information and Language Processing, University of Wolverhampton; UPF Barcelona; University of.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
NSF PARTNERSHIP FOR RESEARCH AND EDUCATION : M EANING R EPRESENTATION FOR S TATISTICAL L ANGUAGE P ROCESSING 1 TectoMT TectoMT = highly modular software.
Climbing inside… Creating Effective Close Reading Lessons.
Dependency Parsing Niranjan Balasubramanian March 24 th 2016 Credits: Many slides from: Michael Collins, Mausam, Chris Manning, COLNG 2014 Dependency Parsing.
Learning to Generate Complex Morphology for Machine Translation Einat Minkov †, Kristina Toutanova* and Hisami Suzuki* *Microsoft Research † Carnegie Mellon.
Praha, From the Jungle to a Park: Harmonizing Dependency Treebanks of 30 Languages Dan Zeman, Martin Popel David Mareček, Loganathan Ramasamy,
Natural Language Processing Vasile Rus
Assessing Grammar Module 5 Activity 5.
Approaches to Machine Translation
Useful websites for independent study
CSC 594 Topics in AI – Natural Language Processing
[A Contrastive Study of Syntacto-Semantic Dependencies]
Assessing Grammar Module 5 Activity 5.
Universal Dependencies
Machine Learning in Natural Language Processing
Probabilistic and Lexicalized Parsing
CSCI 5832 Natural Language Processing
AP English Language and Composition
Universal Dependencies
Approaches to Machine Translation
Parsing Unrestricted Text
CSCI 5832 Natural Language Processing
Annotations 9/4/13.
Presentation transcript:

Treebanks as Training Data for Parsers Joakim Nivre Växjö University and Uppsala University

Q1:What do you really care about when you’re building a parser? For parsing unrestricted text, I care about the joint optimization of: –Robustness –Disambiguation –Accuracy –Efficiency Requirement on syntactic annotation: –Balance between expressivity and complexity

Example:Mildly Non-Projective Dependency Structures Dependency structure in two treebanks: –Strictly projective (efficiently parsable): PDT: 75% DDT: 85% –Unrestricted non-projective (often intractable): PDT: 100% DDT: 100% –Well-nested, gap degree ≤ 1: PDT: 99.5% DDT: 99.7% Design choice in treebank annotation?

Q2:What works, what doesn’t? Anything works? –Top systems in CoNLL 2006 shared task: MSTParser: Global, exhaustive, graph-based MaltParser: Local, greedy, stack-based –Features more important than parsers? But not for all languages? –Results from CoNLL 2007 shared task: Configurational languages ≈ 85% LAS (Catalan, Chinese, English, Italian) Richly inflected languages ≈ 75% LAS (Arabic, Basque, Czech, Greek, Hungarian, Turkish) Treebank problem or parser problem?

Q3:What information is useful, what is not? Word level: –Morphological analysis (lemma, derivation, inflection) –Hierarchical parts-of-speech (incl. features) Sentence level: –Complete structural annotation (phrases, heads) –Complete functional annotation (syntactic relations) –Deep/non-local dependencies Integrated morpho-syntactic annotation: –The key to parsing richly inflected languages?

Skipping a few questions … Q4:How does grammar writing interact with treebanking? –No idea. Not my cup of tea. Q5: What methodological lessons can be drawn for treebanking? Q6: What are advantages and disadvantages of preprocessing the data to be treebanked with an automatic parser? –Don’t know. Never got funding to build a real treebank.

Q7:Advantages of a phrase structure and/or a dependency treebank? Obvious answer: –Phrase structure is good for phrase structure parsing. –Dependency is good for dependency parsing. Methodological point: –Parsing lossy conversions can be questionable. Remedy: –Make annotations (just) rich enough to support both. –Annotation scheme: Minimal source annotation Well-defined conversions to target annotations