Universität des Saarlandes Seminar: Recent Advances in Parsing Technology Winter Semester 2011-2012 Jesús Calvillo.

Slides:



Advertisements
Similar presentations
CILC2011 A framework for structured knowledge extraction and representation from natural language via deep sentence analysis Stefania Costantini Niva Florio.
Advertisements

Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
10. Lexicalized and Probabilistic Parsing -Speech and Language Processing- 발표자 : 정영임 발표일 :
1 Statistical NLP: Lecture 12 Probabilistic Context Free Grammars.
Introduction and Jurafsky Model Resource: A Probabilistic Model of Lexical and Syntactic Access and Disambiguation, Jurafsky 1996.
In Search of a More Probable Parse: Experiments with DOP* and the Penn Chinese Treebank Aaron Meyers Linguistics 490 Winter 2009.
January 12, Statistical NLP: Lecture 2 Introduction to Statistical NLP.
Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011.
Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
Introduction to Analysis of Algorithms
Stemming, tagging and chunking Text analysis short of parsing.
Ch 10 Part-of-Speech Tagging Edited from: L. Venkata Subramaniam February 28, 2002.
Are Linguists Dinosaurs? 1.Statistical language processors seem to be doing away with the need for linguists. –Why do we need linguists when a machine.
Ensemble Learning: An Introduction
1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.
Evaluating Hypotheses
1/13 Parsing III Probabilistic Parsing and Conclusions.
MAE 552 – Heuristic Optimization Lecture 26 April 1, 2002 Topic:Branch and Bound.
Grammar induction by Bayesian model averaging Guy Lebanon LARG meeting May 2001 Based on Andreas Stolcke’s thesis UC Berkeley 1994.
Part of speech (POS) tagging
Seven Lectures on Statistical Parsing Christopher Manning LSA Linguistic Institute 2007 LSA 354 Lecture 7.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Albert Gatt Corpora and Statistical Methods Lecture 9.
Robert Hass CIS 630 April 14, 2010 NP NP↓ Super NP tagging JJ ↓
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.
Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
Experiments on Building Language Resources for Multi-Modal Dialogue Systems Goals identification of a methodology for adapting linguistic resources for.
Comparative study of various Machine Learning methods For Telugu Part of Speech tagging -By Avinesh.PVS, Sudheer, Karthik IIIT - Hyderabad.
1 Statistical Parsing Chapter 14 October 2012 Lecture #9.
Approximation Algorithms Pages ADVANCED TOPICS IN COMPLEXITY THEORY.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
Discriminative Models for Spoken Language Understanding Ye-Yi Wang, Alex Acero Microsoft Research, Redmond, Washington USA ICSLP 2006.
Arabic Tokenization, Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop Nizar Habash and Owen Rambow Center for Computational Learning.
A Language Independent Method for Question Classification COLING 2004.
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
Tokenization & POS-Tagging
Basic Parsing Algorithms: Earley Parser and Left Corner Parsing
Supertagging CMSC Natural Language Processing January 31, 2006.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
Shallow Parsing for South Asian Languages -Himanshu Agrawal.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
Concepts and Realization of a Diagram Editor Generator Based on Hypergraph Transformation Author: Mark Minas Presenter: Song Gu.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Integrating linguistic knowledge in passage retrieval for question answering J¨org Tiedemann Alfa Informatica, University of Groningen HLT/EMNLP 2005.
Dependency Parsing Niranjan Balasubramanian March 24 th 2016 Credits: Many slides from: Michael Collins, Mausam, Chris Manning, COLNG 2014 Dependency Parsing.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Mixture Densities Maximum Likelihood Estimates.
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss Pedro Domingos, Michael Pazzani Presented by Lu Ren Oct. 1, 2007.
A CRF-BASED NAMED ENTITY RECOGNITION SYSTEM FOR TURKISH Information Extraction Project Reyyan Yeniterzi.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.
LingWear Language Technology for the Information Warrior Alex Waibel, Lori Levin Alon Lavie, Robert Frederking Carnegie Mellon University.
Natural Language Processing Vasile Rus
Part-Of-Speech Tagging Radhika Mamidi. POS tagging Tagging means automatic assignment of descriptors, or tags, to input tokens. Example: “Computational.
Approaches to Machine Translation
CSCI 5832 Natural Language Processing
Probabilistic and Lexicalized Parsing
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Approaches to Machine Translation
LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.
Parsing Unrestricted Text
Meni Adler and Michael Elhadad Ben Gurion University COLING-ACL 2006
Artificial Intelligence 2004 Speech & Natural Language Processing
Presentation transcript:

Universität des Saarlandes Seminar: Recent Advances in Parsing Technology Winter Semester Jesús Calvillo

 Introduction  Overview  Part of Speech Tagging  Lexical Ambiguity  HMM Tagger  Tagger Training  Results  Disambiguation Component  Parsing  Recovery of Best Parse  Accuracy  References

 What is Alpino?  Computational Analyzer for Dutch.  Exploits Knowledge-based (HPSG-grammar and -lexicon) and Corpus-based Technologies.  Aims at accurate, full parsing of unrestricted text, with coverage and accuracy comparable to state-of-the-art parsers for English.

 Grammar  Wide Coverage Computational HPSG.  About 600 construction specific rules. Rather than general rule schemata and abstract linguistic principles.  Lexicon  About 100,000 entries and 200,000 named entities.  Lexical rules for dates, temporal expressions, etc.  Large variety of unknown word heuristics.  Morphological constructor.

 Lexical ambiguity has an important negative effect on parsing efficiency.  In some cases, a category assigned is obviously wrong.  I called the man up  I called the man  Application of hand-written rules relies on human experts and is bound to have mistakes.

 Training corpus used by the tagger is labeled by the parser itself (unsupervised learning).  Not forced to disambiguate all words. It only removes about half of the tags assigned by the dictionary.  Resulting System can be much faster, while parsing accuracy actually increases slightly.

 Variant of a standard trigram HMM tagger  To Discard tags: Compute probabilities for each tag individually: α and β are the forward and backward probabilities as defined: is the total probability of all paths through the model that end at tag t at position i; is the total probability of all paths starting at tag t in position i, to the end.

 After calculating all the probabilities for all the potential tags... A tag t on position i is removed if there is another t´, such that: is a constant threshold value.

 Training Corpus constructed by the parser.  Running the parser on a large set of example sentences, and collecting the sequences of lexical category classes that were used by what the parser believed to be the best parse.  Contains Errors. It does not learn the “correct” lexical category sequences, but rather which sequences are favored by the parser.  Corpus: 4 years of Dutch daily newspaper text. Using only “easy” sentences (sentences <20 words or sentences that take <20 secs of CPU time)

 Applied to the first 220 sentences of the Alpino Treebank. 4 Sentences were removed.  Low threshold -> small number of tags -> fast parsing  High threshold -> higher accuracy -> decrease efficiency.  If all lexical categories for a given sentence are allowed, then the parser can can almost always find a single (but sometimes bad) parse.  If the parser is limited to the more plausible lexical categories, it will more often come up with a robust parse containing two or more partiall parses.  A modest decrease in coverage results in a modest increase in accuracy.  Best threshold: 4.25

 Simple rule frequency methods known from context free parsing cannot be used directly for HPSG-like formalism, since these methods rely crucially on the statistical independence of context-free rule applications.  Solution: Maximum Entropy Models.

 A typically large set of features of parses are identified. They distinguish “good” parses from “bad” parses.  Parses represented as vectors. Each cell contains the frequency of a particular feature (40,000 in Alpino).  The features encode:  rule names,  local trees of rule names,  pairs of words and their lexical category,  lexical dependencies between words, etc. Among them a variety of more global syntactic features exists:  features to recognize whether the coordinations are parallel in structure,  features which recognize whether the dependency in a WH-question or a relative clause is local or not, etc.

 In training, a weight is established for each feature indicating that parses containing the corresponding feature should be preferred or not.  The parse evaluation function is the sum of the counts of the frequency of each feature times the weight of the features.  The parse with the largest sum is the best parse.  Drawback: If we train the model, we need access to all parses of a corpus sentence.

 It suffices to train on the basis of representative samples of parses for each training sentence. (Osborne,2000)  Any sub-sample of the parses in the training data which yields unbiased estimates of feature expectations should result in as accurate a model as the complete set of parses.

Problem: Alpino treebank contains correct Dependency Structures.  Dependency Structures abstract away from syntactic details.  The training data should contain the full parse as produced by the grammar. Possible Solution: Use the grammar to parse a given sentence and then select the parse with the correct dependency structure. However, the parser will not always be able to produce a parse with the correct dependency structure.

 Mapping the accuracy of a parse to the frequency of that parse in the training data.  Rather than distinguishing correct and incorrect, we determine the “quality” of each parse: Concept Accuracy (CA)  is the number of relations produced by the parser for sentence i, is the number of relations in the treebank parse, and is the number of incorrect and missing relations produced by the parser.  Thus, if a parse has a CA of 85%, we add the parse to the training data marked with a weight of 0.85.

 The left-corner parser constructs all possible parses.  The Parse Forest is a tree substitution grammar, which derives exactly all derivation trees of the input sentence.  Each tree in the tree substitution grammar is a left-corner spine.

For each state in the search space maintain only the b best candidates, where b is a small integer (the beam). If the beam is decreased, we run a larger risk of missing the best parse (the result will typically still be a “good” parse); if the beam is increased, then the amount of computation increases.

 Alpino: development set optimized.  CLEF: Dutch questions from the CLEF Questioning Answering competition (2003,2004 and 2005).  Trouw: First 1400 sentences of the Trouw 2001 newspaper, from the Twente News corpus.

 [Mal04] Robert Malouf and Gertjan van Noord. Wide coverage parsing with stochastic attribute value grammars. In Proceedings of the IJCNLP-04 workshop: beyond shallow analyses - formalisms and statistical modeling for deep analyses, Hainan Island, China,  [van06] Gertjan van Noord. At Last Parsing Is Now Operational. In Actes de la 13e conference sur le traitement automatique des langues naturelles (TALN 2006), pages 20–42, Leuven, Belgium, 2006.