Seven Lectures on Statistical Parsing Christopher Manning LSA Linguistic Institute 2007 LSA 354 Lecture 6.

Slides:



Advertisements
Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
Advertisements

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Parsing: computing the grammatical structure of English sentences COMP3310.
Albert Gatt Corpora and Statistical Methods Lecture 11.
Learning Accurate, Compact, and Interpretable Tree Annotation Recent Advances in Parsing Technology WS 2011/2012 Saarland University in Saarbrücken Miloš.
IWPT 2005 J. Eisner & N. A. Smith Parsing with Soft & Hard Constraints on Dependency Length Parsing with Soft and Hard Constraints on Dependency Length.
Dependency Parsing Some slides are based on:
Statistical NLP: Lecture 3
Semantic Role Labeling Abdul-Lateef Yussiff
Probabilistic Parsing Chapter 14, Part 2 This slide set was adapted from J. Martin, R. Mihalcea, Rebecca Hwa, and Ray Mooney.
Recognizing Implicit Discourse Relations in the Penn Discourse Treebank Ziheng Lin, Min-Yen Kan, and Hwee Tou Ng Department of Computer Science National.
Annotating language data Tomaž Erjavec Institut für Informationsverarbeitung Geisteswissenschaftliche Fakultät Karl-Franzens-Universität Graz Tomaž Erjavec.
Shallow Parsing CS 4705 Julia Hirschberg 1. Shallow or Partial Parsing Sometimes we don’t need a complete parse tree –Information extraction –Question.
Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011.
PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.
Albert Gatt LIN3022 Natural Language Processing Lecture 8.
Introduction to Syntax, with Part-of-Speech Tagging Owen Rambow September 17 & 19.
Syntax and Context-Free Grammars CMSC 723: Computational Linguistics I ― Session #6 Jimmy Lin The iSchool University of Maryland Wednesday, October 7,
Parsing the NEGRA corpus Greg Donaker June 14, 2006.
Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.
Seven Lectures on Statistical Parsing Christopher Manning LSA Linguistic Institute 2007 LSA 354 Lecture 7.
SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.
Hand video 
Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.
BİL711 Natural Language Processing1 Statistical Parse Disambiguation Problem: –How do we disambiguate among a set of parses of a given sentence? –We want.
Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
1 Data-Driven Dependency Parsing. 2 Background: Natural Language Parsing Syntactic analysis String to (tree) structure He likes fish S NP VP NP VNPrn.
1 Statistical Parsing Chapter 14 October 2012 Lecture #9.
Syntax The study of how words are ordered and grouped together Key concept: constituent = a sequence of words that acts as a unit he the man the short.
10/12/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 10 Giuseppe Carenini.
GRAMMARS David Kauchak CS159 – Fall 2014 some slides adapted from Ray Mooney.
SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules.
1 Semi-Supervised Approaches for Learning to Parse Natural Languages Rebecca Hwa
University of Edinburgh27/10/20151 Lexical Dependency Parsing Chris Brew OhioState University.
A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart
PARSING David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture August 2007.
Albert Gatt Corpora and Statistical Methods Lecture 11.
CPE 480 Natural Language Processing Lecture 4: Syntax Adapted from Owen Rambow’s slides for CSc Fall 2006.
CSA2050 Introduction to Computational Linguistics Parsing I.
PARSING 2 David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-14: Probabilistic parsing; sequence labeling, PCFG.
NLP. Introduction to NLP The probabilities don’t depend on the specific words –E.g., give someone something (2 arguments) vs. see something (1 argument)
Supertagging CMSC Natural Language Processing January 31, 2006.
Part-of-speech tagging
Natural Language Processing Lecture 15—10/15/2015 Jim Martin.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
Natural Language Processing Lecture 14—10/13/2015 Jim Martin.
Handling Unlike Coordinated Phrases in TAG by Mixing Syntactic Category and Grammatical Function Carlos A. Prolo Faculdade de Informática – PUCRS CELSUL,
December 2011CSA3202: PCFGs1 CSA3202: Human Language Technology Probabilistic Phrase Structure Grammars (PCFGs)
LING/C SC/PSYC 438/538 Lecture 18 Sandiway Fong. Adminstrivia Homework 7 out today – due Saturday by midnight.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-15: Probabilistic parsing; PCFG (contd.)
PARSING David Kauchak CS159 – Fall Admin Assignment 3 Quiz #1  High: 36  Average: 33 (92%)  Median: 33.5 (93%)
LING/C SC 581: Advanced Computational Linguistics Lecture Notes Feb 3 rd.
Dependency Parsing Niranjan Balasubramanian March 24 th 2016 Credits: Many slides from: Michael Collins, Mausam, Chris Manning, COLNG 2014 Dependency Parsing.
LING/C SC 581: Advanced Computational Linguistics Lecture Notes Feb 17 th.
Roadmap Probabilistic CFGs –Handling ambiguity – more likely analyses –Adding probabilities Grammar Parsing: probabilistic CYK Learning probabilities:
Natural Language Processing Vasile Rus
Statistical NLP Winter 2009
Statistical NLP: Lecture 3
Authorship Attribution Using Probabilistic Context-Free Grammars
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27
LING/C SC/PSYC 438/538 Lecture 20 Sandiway Fong.
Probabilistic and Lexicalized Parsing
LING/C SC 581: Advanced Computational Linguistics
Probabilistic and Lexicalized Parsing
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 26
David Kauchak CS159 – Spring 2019
David Kauchak CS159 – Spring 2019
Presentation transcript:

Seven Lectures on Statistical Parsing Christopher Manning LSA Linguistic Institute 2007 LSA 354 Lecture 6

Treebanks and linguistic theory

Penn Chinese Treebank: Linguistic Characteristics [Xue, Xia, Chiou, & Palmer 2005] Source: Xinhua news service articles Segmented text It’s harder when you compose in errors from word segmentation as well…. Nearly identical sentence length as WSJ Treebank Annotated in a much more GB-like style CP and IP (Fairly) Consistent differentiation of modifiers from complements

Headedness English: basically head-initial. PP modifiers follow NP; arguments and PP modifiers follow V Chinese: mostly head-final, but V (and P) precede objects. Typologically unusual!

Syntactic sources of ambiguity English: PP attachment (well-understood); coordination scoping (less well-understood) Chinese: modifier attachment less of a problem, as verbal modifiers & direct objects aren’t adjacent, and NP modifiers are overtly marked.

Error tabulation [Levy and Manning 2003]

Tagging errors N/V tagging a major source of parse error V as N errors outnumber N as V by 3.2:1 Corpus-wide N:V ratio about 2.5:1 N/V errors can cascade as N and V project different phrase structures (NP is head-final, VP is not) Possible disambiguating factors: derivational or inflectional morphology function words in close proximity (c.f. English the, to) knowledge of prior distribution for tag frequency non-local context

Tagging errors Chinese has little to no morphological inflection As a result, the part-of-speech ambiguity problem tends to be greater than in English. Function words are also much less frequent in Chinese Suggests that a large burden may be put on prior distribution over V/N tag increase* 增加 * increases* 增加 * increased 增加 * increasing 增加 *

Tagging error experiment [Levy and Manning 2003] N/V error experiment: merge all N and V tags in training data Results in 5.1% F1 drop for vanilla PCFG; 1.7% drop for enhanced model In English, with equivalent-sized training set, tag merge results in 0.21% drop in recall and 0.06% increase in precision for vanilla PCFG Indicates considerable burden on POS priors in Chinese

Chinese lexicalized parser learning curve [Levy and Manning 2003] Chinese Treebank 3.0 release (100% ~300,000 words)

A hotly debated case: German Linguistic characteristics, relative to English Ample derivational and inflectional morphology Freer word order Verb position differs in matrix/embedded clauses Main ambiguities similar to English Most used corpus: Negra ~400,000 words newswire text Flatter phrase structure annotations (few PPs!) Explicitly marked phrasal discontinuities Newer Treebank: TueBaDz ~470,000 words newswire text (27,000 sentences) [Not replacement; different group; different style]

German results Dubey and Keller [ACL 2003] present an unlexicalized PCFG outperforming Collins on NEGRA – and then get small wins from a somewhat unusual sister-head model, but… LPrecLRecF1 D&K PCFG Baseline D&K Collins D&KSister-head all LPrecLRecF1 Stanford PCFG Baseline Stanford Lexicalized See also [Arun & Keller ACL 2005, Kübler & al. EMNLP 2006

Prominent ambiguities PP attachment

Prominent ambiguities Sentential complement vs. relative clause

Dependency parsing

Dependency Grammar/Parsing A sentence is parsed by relating each word to other words in the sentence which depend on it. The idea of dependency structure goes back a long way To P ā ṇ ini’s grammar (c. 5th century BCE) Constituency is a new-fangled invention 20th century invention Modern work often linked to work of L. Tesniere (1959) Dominant approach in “East” (Eastern bloc/East Asia) Among the earliest kinds of parsers in NLP, even in US: David Hays, one of the founders of computational linguistics, built early (first?) dependency parser (Hays 1962)

Dependency structure Words are linked from head (regent) to dependent Warning! Some people do the arrows one way; some the other way (Tesniere has them point from head to dependent…). Usually add a fake ROOT so every word is a dependent Shaw Publishing acquired 30 % of American City in March $$

Relation between CFG to dependency parse A dependency grammar has a notion of a head Officially, CFGs don’t But modern linguistic theory and all modern statistical parsers (Charniak, Collins, Stanford, …) do, via hand-written phrasal “head rules”: The head of a Noun Phrase is a noun/number/adj/… The head of a Verb Phrase is a verb/modal/…. The head rules can be used to extract a dependency parse from a CFG parse (follow the heads). A phrase structure tree can be got from a dependency tree, but dependents are flat (no VP!)

Propagating head words Small set of rules propagate heads S(announced) NP(Smith) NNP John NNP Smith NP(president) NP DT the NN president PP(of) IN of NP NNP IBM VP(announced) VBD announced NP(resignation) PRP$ his NN resignation NP NN yesterday

Extracted structure NB. Not all dependencies shown here Dependencies are inherently untyped, though some work like Collins (1996) types them using the phrasal categories NP [JohnSmith] NP [thepresident]of[IBM] SNPVP announced[hisResignation][yesterday] VPVBDNP VP VBD

Sources of information: bilexical dependencies distance of dependencies valency of heads (number of dependents) A word’s dependents (adjuncts, arguments) tend to fall near it in the string. Dependency Conditioning Preferences These next 6 slides are based on slides by Jason Eisner and Noah Smith

Probabilistic dependency grammar: generative model 1.Start with left wall $ 2.Generate root w 0 3.Generate left children w -1, w -2,..., w - ℓ from the FSA λ w 0 4.Generate right children w 1, w 2,..., w r from the FSA ρ w 0 5.Recurse on each w i for i in {- ℓ,..., -1, 1,..., r}, sampling α i (steps 2-4) 6.Return α ℓ... α -1 w 0 α 1... α r w0w0 w -1 w -2 w-ℓw-ℓ wrwr w2w2 w1w1... w -ℓ.-1 $ λw-ℓλw-ℓ λw0λw0 ρw0ρw0

Naïve Recognition/Parsing Ittakestwototango ItIt takestwoto tango totakes O ( n 5 ) combinations ItIt p pc ij k O ( n 5 N 3 ) if N nonterminals r 0 n goal

Dependency Grammar Cubic Recognition/Parsing (Eisner & Satta, 1999) Triangles: span over words, where tall side of triangle is the head, other side is dependent, and no non-head words expecting more dependents Trapezoids: span over words, where larger side is head, smaller side is dependent, and smaller side is still looking for dependents on its side of the trapezoid } }

Dependency Grammar Cubic Recognition/Parsing (Eisner & Satta, 1999) Ittakestwototango goal One trapezoid per dependency. A triangle is a head with some left (or right) subtrees.

Cubic Recognition/Parsing (Eisner & Satta, 1999) ij k ij k ij k ij k O ( n 3 ) combinations 0 in goal Gives O ( n 3 ) dependency grammar parsing O ( n ) combinations

Evaluation of Dependency Parsing: Simply use (labeled) dependency accuracy We SUBJ 2 0 eat ROOT 3 5 the DET 4 5 cheeseMOD 5 2 sandwichSUBJ 1 2 We SUBJ 2 0 eat ROOT 3 4 the DET 4 2 cheeseOBJ 5 2 sandwichPRED Accuracy = number of correct dependencies total number of dependencies = 2 / 5 = % GOLDPARSED

McDonald et al. (2005 ACL): Online Large-Margin Training of Dependency Parsers Builds a discriminative dependency parser Can condition on rich features in that context Best-known recent dependency parser Lots of recent dependency parsing activity connected with CoNLL 2006/2007 shared task Doesn’t/can’t report constituent LP/LR, but evaluating dependencies correct: Accuracy is similar to but a fraction below dependencies extracted from Collins: 90.9% vs. 91.4% … combining them gives 92.2% [all lengths] Stanford parser on length up to 40: Pure generative dependency model: 85.0% Lexicalized factored parser: 91.0%

McDonald et al. (2005 ACL): Online Large-Margin Training of Dependency Parsers Score of a parse is the sum of the scores of its dependencies Each dependency is a linear function of features times weights Feature weights are learned by MIRA, an online large-margin algorithm But you could think of it as using a perceptron or maxent classifier Features cover: Head and dependent word and POS separately Head and dependent word and POS bigram features Words between head and dependent Length and direction of dependency

Extracting grammatical relations from statistical constituency parsers [de Marneffe et al. LREC 2006] Exploit the high-quality syntactic analysis done by statistical constituency parsers to get the grammatical relations [typed dependencies] Dependencies are generated by pattern-matching rules Bills on ports and immigration were submitted by Senator Brownback NP S NNP PP IN VP VBN VBD NN CC NNS NPIN NP PP NNS submitted Bills were Brownback Senator nsubjpassauxpass agent nn prep_on ports immigration cc_and