Parsing the NEGRA corpus Greg Donaker June 14, 2006.

Slides:



Advertisements
Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
Advertisements

Using Syntax to Disambiguate Explicit Discourse Connectives in Text Source: ACL-IJCNLP 2009 Author: Emily Pitler and Ani Nenkova Reporter: Yong-Xiang Chen.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
Learning Accurate, Compact, and Interpretable Tree Annotation Recent Advances in Parsing Technology WS 2011/2012 Saarland University in Saarbrücken Miloš.
Learning and Inference for Hierarchically Split PCFGs Slav Petrov and Dan Klein.
Language Data Resources Treebanks. A treebank is a … database of syntactic trees corpus annotated with morphological and syntactic information segmented,
Part of Speech Tagging Importance Resolving ambiguities by assigning lower probabilities to words that don’t fit Applying to language grammatical rules.
Recognizing Implicit Discourse Relations in the Penn Discourse Treebank Ziheng Lin, Min-Yen Kan, and Hwee Tou Ng Department of Computer Science National.
Annotating language data Tomaž Erjavec Institut für Informationsverarbeitung Geisteswissenschaftliche Fakultät Karl-Franzens-Universität Graz Tomaž Erjavec.
In Search of a More Probable Parse: Experiments with DOP* and the Penn Chinese Treebank Aaron Meyers Linguistics 490 Winter 2009.
Shallow Parsing CS 4705 Julia Hirschberg 1. Shallow or Partial Parsing Sometimes we don’t need a complete parse tree –Information extraction –Question.
Part of Speech Tagging with MaxEnt Re-ranked Hidden Markov Model Brian Highfill.
Introduction to treebanks Session 1: 7/08/
PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.
Partial Prebracketing to Improve Parser Performance John Judge NCLT Seminar Series 7 th December 2005.
Are Linguists Dinosaurs? 1.Statistical language processors seem to be doing away with the need for linguists. –Why do we need linguists when a machine.
1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/ Shallow Parsing.
Seven Lectures on Statistical Parsing Christopher Manning LSA Linguistic Institute 2007 LSA 354 Lecture 6.
1 I256: Applied Natural Language Processing Marti Hearst Sept 25, 2006.
Växjö University Joakim Nivre Växjö University. 2 Who? Växjö University (800) School of Mathematics and Systems Engineering (120) Computer Science division.
Breaking the Resource Bottleneck for Multilingual Parsing Rebecca Hwa, Philip Resnik and Amy Weinberg University of Maryland.
Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.
Thoughts on Treebanks Christopher Manning Stanford University.
LEARNING WORD TRANSLATIONS Does syntactic context fare better than positional context? NCLT/CNGL Internal Workshop Ankit Kumar Srivastava 24 July 2008.
NATURAL LANGUAGE TOOLKIT(NLTK) April Corbet. Overview 1. What is NLTK? 2. NLTK Basic Functionalities 3. Part of Speech Tagging 4. Chunking and Trees 5.
SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.
Semantic Parsing for Robot Commands Justin Driemeyer Jeremy Hoffman.
Natural Language Processing Assignment Group Members: Soumyajit De Naveen Bansal Sanobar Nishat.
Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning
The mental representation of sentences Tree structures or state vectors? Stefan Frank
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.
1 Statistical Parsing Chapter 14 October 2012 Lecture #9.
SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
1 Semi-Supervised Approaches for Learning to Parse Natural Languages Rebecca Hwa
ACBiMA: Advanced Chinese Bi-Character Word Morphological Analyzer 1 Ting-Hao (Kenneth) Huang Yun-Nung (Vivian) Chen Lingpeng Kong
The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.
Semi-supervised Training of Statistical Parsers CMSC Natural Language Processing January 26, 2006.
A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart
Parsing Lecture 5 Fri, Jan 28, Syntax Analysis The syntax of a language is described by a context-free grammar. Each grammar rule has the form A.
Albert Gatt Corpora and Statistical Methods Lecture 11.
13-1 Chapter 13 Part-of-Speech Tagging POS Tagging + HMMs Part of Speech Tagging –What and Why? What Information is Available? Visible Markov Models.
Conversion of Penn Treebank Data to Text. Penn TreeBank Project “A Bank of Linguistic Trees” (as of 11/1992) University of Pennsylvania, LINC Laboratory.
What you have learned and how you can use it : Grammars and Lexicons Parts I-III.
A.F.K. by SoTel. An Introduction to SoTel SoTel created A.F.K., an Android application used to auto generate text message responses to other users. A.F.K.
Tokenization & POS-Tagging
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
LING 001 Introduction to Linguistics Spring 2010 Syntactic parsing Part-Of-Speech tagging Apr. 5 Computational linguistics.
February 2007CSA3050: Tagging III and Chunking 1 CSA2050: Natural Language Processing Tagging 3 and Chunking Transformation Based Tagging Chunking.
Automatic Grammar Induction and Parsing Free Text - Eric Brill Thur. POSTECH Dept. of Computer Science 심 준 혁.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
LING/C SC/PSYC 438/538 Lecture 18 Sandiway Fong. Adminstrivia Homework 7 out today – due Saturday by midnight.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-15: Probabilistic parsing; PCFG (contd.)
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.
LING/C SC 581: Advanced Computational Linguistics Lecture Notes Feb 17 th.
Cross-language Projection of Dependency Trees Based on Constrained Partial Parsing for Tree-to-Tree Machine Translation Yu Shen, Chenhui Chu, Fabien Cromieres.
Statistical Natural Language Parsing Parsing: The rise of data and statistics.
Language Identification and Part-of-Speech Tagging
Raymond J. Mooney University of Texas at Austin
Unit-3 Bottom-Up-Parsing.
David Mareček and Zdeněk Žabokrtský
Authorship Attribution Using Probabilistic Context-Free Grammars
Natural Language Processing
LING/C SC/PSYC 438/538 Lecture 20 Sandiway Fong.
LING/C SC 581: Advanced Computational Linguistics
Stanford CoreNLP
LING/C SC 581: Advanced Computational Linguistics
Presentation transcript:

Parsing the NEGRA corpus Greg Donaker June 14, 2006

NEGRA Corpus German language tagged corpus 20,602 sentences (355,096 tokens) Significantly smaller than Penn Treebank Can be used similarly to Penn Treebank Similar annotations, much flatter trees [Dubey & Keller 2003]

Baseline error analysis Ran through Stanford Parser using NEGRA specific parameters 91.75% tagging accuracy PCFG f-score: Most frequently underproposed rule: NP -> ART NN (98 times) Most frequently underproposed category: NN (498 times – three times the next category) These errors seem abnormally high based on the structure of German language.

Approach Bug modeled tag distribution of unknown words as baseline distribution Reworked unknown word model to specifics of German language Model based on first letter, capitalization of first letter, ending substring of words

Results Best performing (on both test and validation sets) model matched intuition Capitalization of first letter, last two characters of word Improves Tagging accuracy from 91.75% to 94.49% Improves PCFG F-score from to Reduces underproposed NP->ART NN from 98 to 48 Reduces underproposed NN from 498 to 73