Probabilistic and Lexicalized Parsing. Probabilistic CFGs Weighted CFGs –Attach weights to rules of CFG –Compute weights of derivations –Use weights to.

Slides:

Advertisements

Similar presentations

Probabilistic and Lexicalized Parsing CS Probabilistic CFGs: PCFGs Weighted CFGs –Attach weights to rules of CFG –Compute weights of derivations.

Advertisements

Probabilistic Earley Parsing Charlie Kehoe, Spring 2004 Based on the 1995 paper by Andreas Stolcke: An Efficient Probabilistic Context-Free Parsing Algorithm.

Chapter 12 Lexicalized and Probabilistic Parsing Guoqiang Shan University of Arizona November 30, 2006.

March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,

Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011.

PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27

6/9/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 11 Giuseppe Carenini.

Parsing with PCFG Ling 571 Fei Xia Week 3: 10/11-10/13/05.

1/13 Parsing III Probabilistic Parsing and Conclusions.

Features and Unification

Basic Parsing with Context- Free Grammars 1 Some slides adapted from Julia Hirschberg and Dan Jurafsky.

Fall 2004 Lecture Notes #5 EECS 595 / LING 541 / SI 661 Natural Language Processing.

Context Free Grammar S -> NP VP NP -> det (adj) N

Probabilistic Parsing Ling 571 Fei Xia Week 4: 10/18-10/20/05.

Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.

Context-Free Grammar CSCI-GA.2590 – Lecture 3 Ralph Grishman NYU.

11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.

PARSING David Kauchak CS457 – Fall 2011 some slides adapted from Ray Mooney.

1 Basic Parsing with Context- Free Grammars Slides adapted from Dan Jurafsky and Julia Hirschberg.

BİL711 Natural Language Processing1 Statistical Parse Disambiguation Problem: –How do we disambiguate among a set of parses of a given sentence? –We want.

Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.

1 Statistical Parsing Chapter 14 October 2012 Lecture #9.

December 2004CSA3050: PCFGs1 CSA305: Natural Language Algorithms Probabilistic Phrase Structure Grammars (PCFGs)

GRAMMARS David Kauchak CS159 – Fall 2014 some slides adapted from Ray Mooney.

Chapter 10. Parsing with CFGs From: Chapter 10 of An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, by.

Parsing I: Earley Parser CMSC Natural Language Processing May 1, 2003.

11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.

Page 1 Probabilistic Parsing and Treebanks L545 Spring 2000.

Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture August 2007.

Albert Gatt Corpora and Statistical Methods Lecture 11.

Lexicalized and Probabilistic Parsing Read J & M Chapter 12.

PARSING 2 David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.

NLP. Introduction to NLP Motivation –A lot of the work is repeated –Caching intermediate results improves the complexity Dynamic programming –Building.

CPSC 422, Lecture 28Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 28 Nov, 18, 2015.

Natural Language Processing Lecture 15—10/15/2015 Jim Martin.

CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.

December 2011CSA3202: PCFGs1 CSA3202: Human Language Technology Probabilistic Phrase Structure Grammars (PCFGs)

GRAMMARS David Kauchak CS457 – Spring 2011 some slides adapted from Ray Mooney.

NLP. Introduction to NLP Time flies like an arrow –Many parses –Some (clearly) more likely than others –Need for a probabilistic ranking method.

PARSING David Kauchak CS159 – Fall Admin Assignment 3 Quiz #1  High: 36  Average: 33 (92%)  Median: 33.5 (93%)

Chapter 12: Probabilistic Parsing and Treebanks Heshaam Faili University of Tehran.

Roadmap Probabilistic CFGs –Handling ambiguity – more likely analyses –Adding probabilities Grammar Parsing: probabilistic CYK Learning probabilities:

Statistical Parsing IP disclosure: Content borrowed from J&M 3 rd edition and Raymond Mooney.

Natural Language Processing Vasile Rus

1 Statistical methods in NLP Diana Trandabat

Natural Language Processing Vasile Rus

CSC 594 Topics in AI – Natural Language Processing

CS60057 Speech &Natural Language Processing

Basic Parsing with Context Free Grammars Chapter 13

Probabilistic CKY Parser

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27

CS 388: Natural Language Processing: Statistical Parsing

Probabilistic and Lexicalized Parsing

CS 388: Natural Language Processing: Syntactic Parsing

LING/C SC 581: Advanced Computational Linguistics

CSCI 5832 Natural Language Processing

Probabilistic and Lexicalized Parsing

Lecture 14: Grammar and Parsing (II) November 11, 2004 Dan Jurafsky

CSCI 5832 Natural Language Processing

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27

Parsing and More Parsing

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 26

CPSC 503 Computational Linguistics

CPSC 503 Computational Linguistics

David Kauchak CS159 – Spring 2019

CPSC 503 Computational Linguistics

CPSC 503 Computational Linguistics

David Kauchak CS159 – Spring 2019

CPSC 503 Computational Linguistics

Presentation transcript:

Probabilistic and Lexicalized Parsing

Probabilistic CFGs Weighted CFGs –Attach weights to rules of CFG –Compute weights of derivations –Use weights to pick, preferred parses Utility: Pruning and ordering the search space, disambiguate, Language Model for ASR. Parsing with weighted grammars (like Weighted FA) –T* = arg max T W(T,S) Probabilistic CFGs are one form of weighted CFGs.

Probability Model Rule Probability: –Attach probabilities to grammar rules –Expansions for a given non-terminal sum to 1 R1: VP  V.55 R2: VP  V NP.40 R3: VP  V NP NP.05 –Estimate the probabilities from annotated corpora P(R1)=counts(R1)/counts(VP) Derivation Probability: –Derivation T= {R 1 …R n } –Probability of a derivation: –Most likely probable parse: –Probability of a sentence: Sum over all possible derivations for the sentence Note the independence assumption: Parse probability does not change based on where the rule is expanded.

Structural ambiguity S  NP VP VP  V NP NP  NP PP VP  VP PP PP  P NP NP  John | Mary | Denver V -> called P -> from John called Mary from Denver S VP PP NP VP VNP P John called Mary from Denver S NP VP VNP PP P John called Mary fromDenver NP

Cocke-Younger-Kasami Parser Bottom-up parser with top-down filtering Start State(s): (A, i, i+1) for each A  w i+1 End State: (S, 0,n) n is the input size Next State Rules –(B  i, k) (C, k, j)  (A, i,  j) if A  BC

Example JohncalledMaryfromDenver

Base Case: A  w NP PDenver NPfrom VMary NPcalled John

Recursive Cases: A  BC NP PDenver NPfrom XVMary NPcalled John

NP PDenver VPNPfrom XVMary NPcalled John

NP XPDenver VPNPfrom XVMary NPcalled John

PPNP XPDenver VPNPfrom XVMary NPcalled John

PPNP XPDenver SVPNPfrom VMary NPcalled John

PPNP XXPDenver SVPNPfrom XVMary NPcalled John

NPPPNP XPDenver SVPNPfrom XVMary NPcalled John

NPPPNP XXXPDenver SVPNPfrom XVMary NPcalled John

VPNPPPNP XXXPDenver SVPNPfrom XVMary NPcalled John

VPNPPPNP XXXPDenver SVPNPfrom XVMary NPcalled John

VP 1 VP 2 NPPPNP XXXPDenver SVPNPfrom XVMary NPcalled John

SVP 1 VP 2 NPPPNP XXXPDenver SVPNPfrom XVMary NPcalled John

SVPNPPPNP XXXPDenver SVPNPfrom XVMary NPcalled John

Probabilistic CKY Assign probabilities to constituents as they are completed and placed in the table Computing the probability –Since we are interested in the max P(S,0,n) Use the max probability for each constituent Maintain back-pointers to recover the parse.

Problems with PCFGs The probability model we’re using is just based on the rules in the derivation. Lexical insensitivity: –Doesn’t use the words in any real way –Structural disambiguation is lexically driven PP attachment often depends on the verb, its object, and the preposition I ate pickles with a fork. I ate pickles with relish. Context insensitivity of the derivation –Doesn’t take into account where in the derivation a rule is used Pronouns more often subjects than objects She hates Mary. Mary hates her. Solution: Lexicalization –Add lexical information to each rule

An example of lexical information: Heads Make use of notion of the head of a phrase –Head of an NP is a noun –Head of a VP is the main verb –Head of a PP is its preposition Each LHS of a rule in the PCFG has a lexical item Each RHS non-terminal has a lexical item. –One of the lexical items is shared with the LHS. If R is the number of binary branching rules in CFG, in lexicalized CFG: O(2*|∑|*|R|) Unary rules: O(|∑|*|R|)

Example (correct parse) Attribute grammar

Example (less preferred)

Computing Lexicalized Rule Probabilities We started with rule probabilities –VP  V NP PP P(rule|VP) E.g., count of this rule divided by the number of VPs in a treebank Now we want lexicalized probabilities –VP(dumped)  V(dumped) NP(sacks)PP(in) –P(rule|VP ^ dumped is the verb ^ sacks is the head of the NP ^ in is the head of the PP) –Not likely to have significant counts in any treebank

Another Example Consider the VPs –Ate spaghetti with gusto –Ate spaghetti with marinara Dependency is not between mother-child. Vp (ate) Pp(with) v Ate spaghetti with gusto np Vp(ate) Pp(with) Np(spag) np v Ate spaghetti with marinara

Log-linear models for Parsing Why restrict to the conditioning to the elements of a rule? –Use even larger context –Word sequence, word types, sub-tree context etc. In general, compute P(y|x); where f i (x,y) test the properties of the context; i is the weight of that feature. Use these as scores in the CKY algorithm to find the best scoring parse.

Supertagging: Almost parsing Poachers now control the underground trade NP N poachers N NN trade S NP VP V NP N poachers  :::: S SAdv now VP Adv now VP AdvVP now :::: S S VP V NP control S NP VP V NP control S NP VP V NP control  S NP Det the NP N trade N NN poachers S NP VP V NP N trade  N NAdj underground S NP VP V NP Adj underground  S NP VP V NP Adj underground  S NP  :

Summary Parsing context-free grammars –Top-down and Bottom-up parsers –Mixed approaches (CKY, Earley parsers) Preferences over parses using probabilities –Parsing with PCFG and PCKY algorithms Enriching the probability model –Lexicalization –Log-linear models for parsing