Parsing with PCFG Ling 571 Fei Xia Week 3: 10/11-10/13/05.

Slides:



Advertisements
Similar presentations
Basic Parsing with Context-Free Grammars CS 4705 Julia Hirschberg 1 Some slides adapted from Kathy McKeown and Dan Jurafsky.
Advertisements

Albert Gatt Corpora and Statistical Methods Lecture 11.
Probabilistic and Lexicalized Parsing CS Probabilistic CFGs: PCFGs Weighted CFGs –Attach weights to rules of CFG –Compute weights of derivations.
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Approaches to Parsing.
Simplifying CFGs There are several ways in which context-free grammars can be simplified. One natural way is to eliminate useless symbols those that cannot.
Chapter 12 Lexicalized and Probabilistic Parsing Guoqiang Shan University of Arizona November 30, 2006.
10. Lexicalized and Probabilistic Parsing -Speech and Language Processing- 발표자 : 정영임 발표일 :
1 Statistical NLP: Lecture 12 Probabilistic Context Free Grammars.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,
CKY Parsing Ling 571 Deep Processing Techniques for NLP January 12, 2011.
Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011.
PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.
6/9/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 11 Giuseppe Carenini.
Parsing with CFG Ling 571 Fei Xia Week 2: 10/4-10/6/05.
Features and Unification
Basic Parsing with Context- Free Grammars 1 Some slides adapted from Julia Hirschberg and Dan Jurafsky.
Inside-outside algorithm LING 572 Fei Xia 02/28/06.
Syntactic Parsing with CFGs CMSC 723: Computational Linguistics I ― Session #7 Jimmy Lin The iSchool University of Maryland Wednesday, October 14, 2009.
Fall 2004 Lecture Notes #5 EECS 595 / LING 541 / SI 661 Natural Language Processing.
Parsing SLP Chapter 13. 7/2/2015 Speech and Language Processing - Jurafsky and Martin 2 Outline  Parsing with CFGs  Bottom-up, top-down  CKY parsing.
Probabilistic Parsing Ling 571 Fei Xia Week 4: 10/18-10/20/05.
Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books עיבוד שפות טבעיות - שיעור תשע Bottom Up Parsing עידו דגן.
Basic Parsing with Context- Free Grammars 1 Some slides adapted from Julia Hirschberg and Dan Jurafsky.
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
PARSING David Kauchak CS457 – Fall 2011 some slides adapted from Ray Mooney.
1 Basic Parsing with Context- Free Grammars Slides adapted from Dan Jurafsky and Julia Hirschberg.
BİL711 Natural Language Processing1 Statistical Parse Disambiguation Problem: –How do we disambiguate among a set of parses of a given sentence? –We want.
1 Statistical Parsing Chapter 14 October 2012 Lecture #9.
December 2004CSA3050: PCFGs1 CSA305: Natural Language Algorithms Probabilistic Phrase Structure Grammars (PCFGs)
GRAMMARS David Kauchak CS159 – Fall 2014 some slides adapted from Ray Mooney.
SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules.
Parsing with Context Free Grammars CSC 9010 Natural Language Processing Paula Matuszek and Mary-Angela Papalaskari This slide set was adapted from: Jim.
Parsing I: Earley Parser CMSC Natural Language Processing May 1, 2003.
PARSING David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
Page 1 Probabilistic Parsing and Treebanks L545 Spring 2000.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture August 2007.
Albert Gatt Corpora and Statistical Methods Lecture 11.
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 29– CYK; Inside Probability; Parse Tree construction) Pushpak Bhattacharyya CSE.
1 Chart Parsing Allen ’ s Chapter 3 J & M ’ s Chapter 10.
Sentence Parsing Parsing 3 Dynamic Programming. Jan 2009 Speech and Language Processing - Jurafsky and Martin 2 Acknowledgement  Lecture based on  Jurafsky.
Basic Parsing Algorithms: Earley Parser and Left Corner Parsing
PARSING 2 David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
NLP. Introduction to NLP Motivation –A lot of the work is repeated –Caching intermediate results improves the complexity Dynamic programming –Building.
CS 4705 Lecture 10 The Earley Algorithm. Review Top-Down vs. Bottom-Up Parsers –Both generate too many useless trees –Combine the two to avoid over-generation:
CPSC 422, Lecture 28Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 28 Nov, 18, 2015.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
December 2011CSA3202: PCFGs1 CSA3202: Human Language Technology Probabilistic Phrase Structure Grammars (PCFGs)
GRAMMARS David Kauchak CS457 – Spring 2011 some slides adapted from Ray Mooney.
Instructor: Nick Cercone CSEB - 1 Parsing and Context Free Grammars Parsers, Top Down, Bottom Up, Left Corner, Earley.
NLP. Introduction to NLP Time flies like an arrow –Many parses –Some (clearly) more likely than others –Need for a probabilistic ranking method.
PARSING David Kauchak CS159 – Fall Admin Assignment 3 Quiz #1  High: 36  Average: 33 (92%)  Median: 33.5 (93%)
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 25– Probabilistic Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th March,
Speech and Language Processing SLP Chapter 13 Parsing.
Natural Language Processing : Probabilistic Context Free Grammars Updated 8/07.
Chapter 12: Probabilistic Parsing and Treebanks Heshaam Faili University of Tehran.
Roadmap Probabilistic CFGs –Handling ambiguity – more likely analyses –Adding probabilities Grammar Parsing: probabilistic CYK Learning probabilities:
Probabilistic and Lexicalized Parsing. Probabilistic CFGs Weighted CFGs –Attach weights to rules of CFG –Compute weights of derivations –Use weights to.
Natural Language Processing Vasile Rus
1 Statistical methods in NLP Diana Trandabat
Parsing Recommended Reading: Ch th Jurafsky & Martin 2nd edition
Basic Parsing with Context Free Grammars Chapter 13
Probabilistic and Lexicalized Parsing
CSCI 5832 Natural Language Processing
Probabilistic and Lexicalized Parsing
David Kauchak CS159 – Spring 2019
CPSC 503 Computational Linguistics
CPSC 503 Computational Linguistics
David Kauchak CS159 – Spring 2019
Presentation transcript:

Parsing with PCFG Ling 571 Fei Xia Week 3: 10/11-10/13/05

Outline Misc CYK algorithm Converting CFG into CNF PCFG Lexicalized PCFG

Misc Quiz 1: 15 pts, due 10/13 Hw2: 10 pts, due 10/13, Treehouse weekly meeting: –Time: every Wed 2:30-3:30pm, tomorrow is the 1 st meeting –Location: EE1 025 (Campus map 12-N, South of MGH) –Mailing list: Others: –Pongo policies –Machines: LLC, Parrington, Treehouse –Linux commands: ssh, sftp, … –Catalyst tools: ESubmit, EPost, …

CYK algorithm

Parsing algorithms Top-down Bottom-up Top-down with bottom-up filtering Earley algorithm CYK algorithm....

CYK algorithm Cocke-Younger-Kasami algorithm (a.k.a. CKY algorithm) Require CFG to be in Chomsky Normal Form (CNF). Bottom-up chart parsing algorithm using DP. Fill in a two-dimension array: C[i][j] contains all the possible syntactic interpretations of the substring Complexity:

Chomsky normal form (CNF) Definition of CNF: –A  B C –A  a –S  A, B, C are non-terminals; a is a terminal. S is the start symbol; B and C are not. For every CFG, there is a CFG in CNF that is weakly equivalent.

CYK algorithm For every rule A  w_i, For span=2 to N for begin=1 to N-span+1 end = begin + span – 1; for m=begin to end-1; for all non-terminals A, B, C: If then

CYK algorithm (another way) For every rule A  w_i, add it to Cell[i][i] For span=2 to N for begin=1 to N-span+1 end = begin + span – 1; for m=begin to end-1; for all non-terminals A, B, C: If Cell[begin][m] contains B ... and Cell[m+1][end] contains C  … and A  BC is a rule in the grammar then add A  BC to Cell[begin][end] and remember m

An example Rules: VP  V NP V  book VP  VP PP N  book/flight/cards NP  Det N Det  that/the NP  NP PP P  with PP  P NP

Parse “book that flight”: C1[begin][end] VP  V NP (m=1) NP  Det N (m=2) N  flight ----Det  that N  book V  book begin=1 begin=2 begin=3 end=1 end=2 end=3

Parse “book that flight”: C2[begin][span] VP  V NP (m=1) ----NP  Det N (m=2) N  book V  book Det  thatN  flight begin=1 begin=2 begin=3 span=1 span=2 span=3

Data structures for the chart (1) (2) (3) (4)

Summary of CYK algorithm Bottom-up using DP Require the CFG to be in CNF A very efficient algorithm Easy to be extended

Converting CFG into CNF

Chomsky normal form (CNF) Definition of CNF: –A  B C, –A  a, –S  Where A, B, C are non-terminals, a is a terminal, S is the start symbol, and B, C are not start symbols. For every CFG, there is a CFG in CNF that is weakly equivalent.

Converting CFG to CNF (1)Add a new symbol S0, and a rule S0  S (so the start symbol will not appear on the rhs of any rule) (2) Eliminate for each rule add for each rule, add unless has been previously eliminated.

Conversion (cont) (3) Remove unit rule add if unless the latter rule was previously removed. (4) Replace a rule where k>2 with replace any terminal with a new symbol and add a new rule

An example

Adding

Removing rules Remove B  Remove A 

Removing unit rules Remove

Removing unit rules (cont) Remove Removing

Converting remaining rules

Summary of CFG parsing Simply top-down and bottom-up parsing generate useless trees. Top-down with bottom-up filtering has three problems. Solution: use DP: –Earley algorithm –CYK algorithm

Probabilistic CFG (PCFG)

PCFG PCFG is an extension of CFG. A PCFG is a 5-tuple=(N, T, P, S, Pr), where Pr is a function assigning probability to each rule in P: or Given a non-terminal A,

A PCFG S  NP VP 0.8 N  Mary 0.01 S  Aux NP VP 0.15 N  book 0.02 S  VP 0.05 VP  V 0.35 V  bought 0.02 VP  V NP 0.45 VP  VP PP 0.20 Det  a 0.04 NP  N 0.8 NP  Det N 0.2 ….

Using probabilities To estimate prob of a sentence and its parse trees. Useful in disambiguation. The prob of a tree: n is a node in T, r(n) is the rule used to expand n in T.

Computing P(T) S  NP VP 0.8 N  Mary 0.01 S  Aux NP VP 0.15 N  book 0.02 S  VP 0.05 VP  V 0.35 V  bought 0.02 VP  V NP 0.45 VP  VP PP 0.20 Det  a 0.04 NP  N 0.8 NP  Det N 0.2 The sentence is “Mary bought a book”.

The most likely tree P(T, S) = P(T) * P(S|T) = P(T) T is a parse tree, S is a sentence The best parse tree for a sentence S

Find the most likely tree Given a PCFG and a sentence, how to find the best parse tree for S? One algorithm: CYK

CYK algorithm for CFG For every rule A  w_i, For span=2 to N for begin=1 to N-span+1 end = begin + span – 1; for m=begin to end-1; for all non-terminals A, B, C: If then

CYK algorithm for CFG (another implementation) For every rule A  w_i, For span=2 to N for begin=1 to N-span+1 end = begin + span – 1; for m=begin to end-1; for all non-terminals A, B, C: if then

Variables for CFG and PCFG CFG: whether there is a parse tree whose root is A and which covers PCFG: the prob of the most likely parse tree whose root is A and which covers

CYK algorithm for PCFG For every rule A  w_i, For span=2 to N for begin=1 to N-span+1 end = begin + span – 1; for m=begin to end-1; for all non-terminals A, B, C: if then

A CFG Rules: VP  V NP V  book VP  VP PP N  book/flight/cards NP  Det N Det  that/the NP  NP PP P  with PP  P NP

Parse “book that flight” VP  V NP (m=1) NP  Det N (m=2) N  flight ----Det  that N  book V  book begin=1 begin=2 begin=3 end=1 end=2 end=3

A PCFG Rules: VP  V NP 0.4 V  book VP  VP PP 0.2 N  book 0.01 NP  Det N 0.3 Det  that 0.1 NP  NP PP 0.2 P  with 0.2 PP  P NP 1.0 N  flight 0.02

Parse “book that flight” VP  V NP (m=1) 2.4e-7 NP  Det N (m=2) 6e-4 N  flight Det  that 0.1 N  book 0.01 V  book begin=1 begin=2 begin=3 end=1 end=2 end=3

N-best parse trees Best parse tree: N-best parse trees:

CYK algorithm for N-best For every rule A  w_i, For span=2 to N for begin=1 to N-span+1 end = begin + span – 1; for m=begin to end-1; for all non-terminals A, B, C: for each if val > one of probs in then remove the last element in and insert val to the array remove the last element in B[begin][end][A] and insert (m, B,C,i, j) to B[begin][end][A].

PCFG for Language Modeling (LM) N-gram LM: Syntax-based LM:

Calculating Pr(S) Parsing: the prob of the most likely parse tree LM: the sum of all parse trees

CYK for finding the most likely parse tree For every rule A  w_i, For span=2 to N for begin=1 to N-span+1 end = begin + span – 1; for m=begin to end-1; for all non-terminals A, B, C: if then

CYK for calculating LM For every rule A  w_i, For span=2 to N for begin=1 to N-span+1 end = begin + span – 1; for m=begin to end-1; for all non-terminals A, B, C:

CYK algorithm One parse treebooleantuple All parse treesbooleanlist of tuples Most likely parse tree real number (the max prob) tuple N-best parse trees list of real numberslist of tuples LM for sentence real number (the sum of probs) not needed

Learning PCFG Probabilities Given a treebank (i.e., a set of trees), use MLE: Without treebanks  inside-outside algorithm

Q&A PCFG CYK algorithm

Problems of PCFG Lack of sensitivity to structural dependency: Lack of sensitivity to lexical dependency:

Structural Dependency Each PCFG rule is assumed to be independent of other rules. Observation: sometimes the choice of how a node expands is dependent on the location of the node in the parse tree. –NP  Pron depends on whether the NP was a subject or an object

Lexical Dependency Given P(NP  NP PP) > P(VP  VP PP) should a PP always be attached to an NP? Verbs such as “send” Preps such as “of”, “into”

Solution to the problems Structural dependency Lexical dependency  Other more sophisticated models.

Lexicalized PCFG

Head and head child Each syntactic constituent is associated with a lexical head. Each context-free rule has a head child: –VP  V NP –NP  Det N –VP  VP PP –NP  NP PP –VP  to VP –VP  aux VP

Head propagation Lexical head propagates from head child to its parent. An example: “Mary bought a book in the store.”

Lexicalized PCFG Lexicalized rules: –VP (bought)  V(bought) NP 0.01 –VP  V NP | 0.01 | 0 | bought - –VP (bought)  V (bought) NP (book) 1.5e-7 –VP  V NP | 1.5e-7 | 0 | bought book

Finding head in a parse tree Head propagation table: simple rules to find head child An example: –(VP left V/VP/Aux) –(PP left P) –(NP right N)

Simplified Model using Lexicalized PCFG PCFG: P(r(n)|n) Lexicalized PCFG: P(r(n)|n, head(n)) – P(VP  VBD NP PP | VP, dumped) – P(VP  VBD NP PP | VP, slept) Parsers that use lexicalized rules –Collins’ parser