Hand video 

Slides:

Advertisements

Similar presentations

Albert Gatt Corpora and Statistical Methods Lecture 11.

Advertisements

Learning Accurate, Compact, and Interpretable Tree Annotation Recent Advances in Parsing Technology WS 2011/2012 Saarland University in Saarbrücken Miloš.

Learning and Inference for Hierarchically Split PCFGs Slav Petrov and Dan Klein.

Probabilistic and Lexicalized Parsing CS Probabilistic CFGs: PCFGs Weighted CFGs –Attach weights to rules of CFG –Compute weights of derivations.

Grammars, constituency and order A grammar describes the legal strings of a language in terms of constituency and order. For example, a grammar for a fragment.

CS 388: Natural Language Processing: Statistical Parsing

Probabilistic Parsing Chapter 14, Part 2 This slide set was adapted from J. Martin, R. Mihalcea, Rebecca Hwa, and Ray Mooney.

March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,

Seven Lectures on Statistical Parsing Christopher Manning LSA Linguistic Institute 2007 LSA 354 Lecture 3.

Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011.

PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.

6/9/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 11 Giuseppe Carenini.

Parsing with PCFG Ling 571 Fei Xia Week 3: 10/11-10/13/05.

Learning Accurate, Compact, and Interpretable Tree Annotation Slav Petrov, Leon Barrett, Romain Thibaux, Dan Klein.

Basic Parsing with Context- Free Grammars 1 Some slides adapted from Julia Hirschberg and Dan Jurafsky.

Introduction to Syntax, with Part-of-Speech Tagging Owen Rambow September 17 & 19.

Artificial Intelligence 2004 Natural Language Processing - Syntax and Parsing - Language Syntax Parsing.

Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.

1 Basic Parsing with Context Free Grammars Chapter 13 September/October 2012 Lecture 6.

SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.

11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.

PARSING David Kauchak CS457 – Fall 2011 some slides adapted from Ray Mooney.

1 Basic Parsing with Context- Free Grammars Slides adapted from Dan Jurafsky and Julia Hirschberg.

BİL711 Natural Language Processing1 Statistical Parse Disambiguation Problem: –How do we disambiguate among a set of parses of a given sentence? –We want.

Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.

Statistical NLP Spring 2010 Lecture 13: Parsing II Dan Klein – UC Berkeley.

1 Statistical Parsing Chapter 14 October 2012 Lecture #9.

December 2004CSA3050: PCFGs1 CSA305: Natural Language Algorithms Probabilistic Phrase Structure Grammars (PCFGs)

GRAMMARS David Kauchak CS159 – Fall 2014 some slides adapted from Ray Mooney.

1 CS546: Machine Learning and Natural Language Latent-Variable Models for Structured Prediction Problems: Syntactic Parsing Slides / Figures from Slav.

12/06/1999 JHU CS /Jan Hajic 1 Introduction to Natural Language Processing ( ) Statistical Parsing Dr. Jan Hajič CS Dept., Johns Hopkins Univ.

ADVANCED PARSING David Kauchak CS159 – Fall 2014 some slides adapted from Dan Klein.

Parsing I: Earley Parser CMSC Natural Language Processing May 1, 2003.

PARSING David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.

11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.

Page 1 Probabilistic Parsing and Treebanks L545 Spring 2000.

Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture August 2007.

Albert Gatt Corpora and Statistical Methods Lecture 11.

Statistical NLP Spring 2010 Lecture 14: PCFGs Dan Klein – UC Berkeley.

CSA2050 Introduction to Computational Linguistics Parsing I.

Natural Language - General

PARSING 2 David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.

NLP. Introduction to NLP The probabilities don’t depend on the specific words –E.g., give someone something (2 arguments) vs. see something (1 argument)

Supertagging CMSC Natural Language Processing January 31, 2006.

CPSC 422, Lecture 28Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 28 Nov, 18, 2015.

Natural Language Processing Lecture 15—10/15/2015 Jim Martin.

CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.

1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Spring 2006-Lecture 2.

December 2011CSA3202: PCFGs1 CSA3202: Human Language Technology Probabilistic Phrase Structure Grammars (PCFGs)

1 Semi-Supervised Approaches for Learning to Parse Natural Languages Slides are from Rebecca Hwa, Ray Mooney.

GRAMMARS David Kauchak CS457 – Spring 2011 some slides adapted from Ray Mooney.

PARSING David Kauchak CS159 – Fall Admin Assignment 3 Quiz #1  High: 36  Average: 33 (92%)  Median: 33.5 (93%)

Dependency Parsing Niranjan Balasubramanian March 24 th 2016 Credits: Many slides from: Michael Collins, Mausam, Chris Manning, COLNG 2014 Dependency Parsing.

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 25– Probabilistic Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th March,

Roadmap Probabilistic CFGs –Handling ambiguity – more likely analyses –Adding probabilities Grammar Parsing: probabilistic CYK Learning probabilities:

Statistical Parsing IP disclosure: Content borrowed from J&M 3 rd edition and Raymond Mooney.

Natural Language Processing Vasile Rus

CSC 594 Topics in AI – Natural Language Processing

COSC 6336 Natural Language Processing Statistical Parsing

Basic Parsing with Context Free Grammars Chapter 13

Statistical NLP Spring 2011

CS 388: Natural Language Processing: Statistical Parsing

Probabilistic and Lexicalized Parsing

CSCI 5832 Natural Language Processing

Probabilistic and Lexicalized Parsing

CSCI 5832 Natural Language Processing

David Kauchak CS159 – Spring 2019

David Kauchak CS159 – Spring 2019

David Kauchak CS159 – Spring 2019

Presentation transcript:

Hand video 

PARSING 3 David Kauchak CS159 – Spring 2011 some slides adapted from Dan Klein

Admin  Assignment 3 out  Due Friday at 6pm  How are things going?  Where we’ve been  Where we’re going

Parsing evaluation  You’ve constructed a parser  You want to know how good it is  Ideas?

Parsing evaluation  Learn a model using the training set  Parse the test set without looking at the “correct” trees  Compare our generated parse tree to the “correct” tree Treebank TrainDevTest

Comparing trees Correct Tree T Computed Tree P Ideas? I eat sushi with tuna PRP NP VNINN PP NP VP S I eat sushi with tuna PRP NP VNIN PP NP VP S N S

Comparing trees  Idea 1: see if the trees match exactly  Problems? Will have a low number of matches (people often disagree) Doesn’t take into account getting it almost right  Idea 2: compare the constituents

Comparing trees Correct Tree T Computed Tree P I eat sushi with tuna PRP NP VNINN PP NP VP S I eat sushi with tuna PRP NP VNIN PP NP VP S How can we turn this into a score? How many constituents match? N S

Evaluation measures  Precision  Recall  F1 # of correct constituents # of constituents in the computed tree # of correct constituents # of constituents in the correct tree 2 * Precision * Recall Precision + Recall

Comparing trees Correct Tree T Computed Tree P I eat sushi with tuna PRP NP VNINN PP NP VP S # Constituents: 11 # Constituents: 10 # Correct Constituents: 9 Precision:Recall:F1:9/11 9/ I eat sushi with tuna PRP NP VNIN PP NP VP S N S

Parsing evaluation  Corpus: Penn Treebank, WSJ  Parsing has been fairly standardized to allow for easy comparison between systems Training:sections02-21 Development:section22 (here, first 20 files) Test:section23

Treebank PCFGs  Use PCFGs for broad coverage parsing  Can take a grammar right off the trees (doesn’t work well): ROOT  S S  NP VP. NP  PRP VP  VBD ADJP ….. ModelF1 Baseline72.0

Generic PCFG Limitations  PCFGs do not use any information about where the current constituent is in the tree  PCFGs do not rely on specific words or concepts, only general structural disambiguation is possible (e.g. prefer to attach PPs to Nominals)  MLE estimates are not always the best

Conditional Independence?  Not every NP expansion can fill every NP slot  A grammar with symbols like “NP” won’t be context-free  Statistically, conditional independence too strong

Non-Independence  Independence assumptions are often too strong.  Example: the expansion of an NP is highly dependent on the parent of the NP (i.e., subjects vs. objects).  Also: the subject and object expansions are correlated All NPsNPs under SNPs under VP

Grammar Refinement  PCFG would treat these two NPs the same… but they’re not!  We can’t exchange them: “the noise heard she”  Idea: expand/refine our grammar  Challenges:  Must refine in ways that facilitate disambiguation  Too much refinement -> sparsity problems  To little -> can’t discriminate (PCFG)

Grammar Refinement Ideas?

Grammar Refinement  Structure Annotation [Johnson ’98, Klein&Manning ’03]  Differentiate constituents based on their local context  Lexicalization [Collins ’99, Charniak ’00]  Differentiate constituents based on the spanned words  Constituent splitting [Matsuzaki et al. 05, Petrov et al. ’06]  Cluster/group words into sub-constituents

Less independence PRP NP VNIN PP NP VP S I eat sushi with tuna N S -> NP VP NP -> PRP PRP -> I VP -> V NP V -> eat NP -> N PP N -> sushi PP -> IN N IN -> with N -> tuna We’re making a strong independence assumption here!

Markovization  Except for the root node, every node in a parse tree has:  A vertical history/context  A horizontal history/context NP VP S NPVBD Traditional PCFGs use the full horizontal context and a vertical context of 1

Vertical Markovization  Vertical Markov order: rewrites depend on past k ancestor nodes.  Order 1 is most common: aka parent annotation Order 1Order 2

Allows us to make finer grained distinctions ^S ^VP

Vertical Markovization F1 performance# of non-terminals

Horizontal Markovization Order 1 Order   Horizontal Markov order: rewrites depend on past k ancestor nodes  Order 1 is most common: condition on a single sibling

Horizontal Markovization F1 performance# of non-terminals

Problems with PCFGs  What’s different between basic PCFG scores here?

Example of Importance of Lexicalization  A general preference for attaching PPs to NPs rather than VPs can be learned by a vanilla PCFG.  But the desired preference can depend on specific words. 27 S → NP VP S → VP NP → Det A N NP → NP PP NP → PropN A → ε A → Adj A PP → Prep NP VP → V NP VP → VP PP English PCFG Parser S NP VP John V NP PP put the dog in the pen John put the dog in the pen.

28 Example of Importance of Lexicalization  A general preference for attaching PPs to NPs rather than VPs can be learned by a vanilla PCFG.  But the desired preference can depend on specific words. S → NP VP S → VP NP → Det A N NP → NP PP NP → PropN A → ε A → Adj A PP → Prep NP VP → V NP VP → VP PP English PCFG Parser S NP VP John V NP put the dog in the pen X John put the dog in the pen.

Lexicalized Trees How could we lexicalize the grammar/tree?

Lexicalized Trees  Add “headwords” to each phrasal node  Syntactic vs. semantic heads  Headship not in (most) treebanks  Usually use head rules, e.g.: NP: Take leftmost NP Take rightmost N* Take rightmost JJ Take right child VP: Take leftmost VB* Take leftmost VP Take left child

Lexicalized PCFGs?  Problem: we now have to estimate probabilities like  How would we estimate the probability of this rule?  Never going to get these automically off of a treebank  Ideas? Count(VP(put) -> VBD(put) NP(dog) PP(in)) Count(VP (put)) VP(put) → VBD(put) NP(dog) PP(in)

One approach  Combine this with some of the markovization techniques we saw  Collins’ (1999) parser  Models productions based on context to the left and the right of the head daughter. LHS → L n L n  1 …L 1 H R 1 …R m  1 R m  First generate the head (H) and then repeatedly generate left (L i ) and right (R i ) context symbols until the symbol STOP is generated.

Sample Production Generation VP put → VBD put NP dog PP in Note: Penn treebank tends to have fairly flat parse trees that produce long productions. VP put → VBD put NP dog HL1L1 STOPPP in STOP R1R1 R2R2 R3R3 P L (STOP | VP put) * P H (VBD | Vp put )* P R (NP dog | VP put ) * P R (PP in | VP put ) * P R (STOP | PP in )

Count(PP in right of head in a VP put production) Estimating Production Generation Parameters  Estimate P H, P L, and P R parameters from treebank data. P R (PP in | VP put ) = Count(symbol right of head in a VP put-VBD ) Count(NP dog right of head in a VP put production) P R (NP dog | VP put ) = Smooth estimates by combining with simpler models conditioned on just POS tag or no lexical info smP R (PP in | VP put- ) = 1 P R (PP in | VP put ) + (1  1 ) ( 2 P R (PP in | VP VBD ) + (1  2 ) P R (PP in | VP)) Count(symbol right of head in a VP put )

Problems with lexicalization  We’ve solved the estimation problem  There’s also the issue of performance  Lexicalization causes the size of the number of grammar rules to explode!  Our parsing algorithms take too long too finish  Ideas?

Pruning during search  We can no longer keep all possible parses around  We can no longer guarantee that we actually return the most likely parse  Beam search [Collins 99]  In each cell only keep the K most likely hypothesis  Disregard constituents over certain spans (e.g. punctuation)  F1 of 88.6!

Pruning with a PCFG  The Charniak parser prunes using a two-pass approach [Charniak 97+]  First, parse with the base grammar  For each X:[i,j] calculate P(X|i,j,s) This isn’t trivial, and there are clever speed ups  Second, do the full O(n 5 ) CKY Skip any X :[i,j] which had low (say, < ) posterior  Avoids almost all work in the second phase!  F1 of 89.7!

Tag splitting  Lexicalization is an extreme case of splitting the tags to allow for better discrimination  Idea: what if rather than doing it for all words, we just split some of the tags

Tag Splits  Problem: Treebank tags are too coarse  Example: Sentential, PP, and other prepositions are all marked IN  Partial Solution:  Subdivide the IN tag AnnotationF1Size Previous K SPLIT-IN K

Other Tag Splits  UNARY-DT: mark demonstratives as DT^U (“the X” vs. “those”)  UNARY-RB: mark phrasal adverbs as RB^U (“quickly” vs. “very”)  TAG-PA: mark tags with non-canonical parents (“not” is an RB^VP)  SPLIT-AUX: mark auxiliary verbs with –AUX [cf. Charniak 97]  SPLIT-CC: separate “but” and “&” from other conjunctions  SPLIT-%: “%” gets its own tag. F1Size K K K K K K

Learning good splits: Latent Variable Grammars Parse Tree Sentence Parameters... Derivations