March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,

Slides:



Advertisements
Similar presentations
Albert Gatt Corpora and Statistical Methods Lecture 11.
Advertisements

Probabilistic and Lexicalized Parsing CS Probabilistic CFGs: PCFGs Weighted CFGs –Attach weights to rules of CFG –Compute weights of derivations.
101 The Cocke-Kasami-Younger Algorithm An example of bottom-up parsing, for CFG in Chomsky normal form G :S  AB | BB A  CC | AB | a B  BB | CA | b C.
Grammars, constituency and order A grammar describes the legal strings of a language in terms of constituency and order. For example, a grammar for a fragment.
Simplifying CFGs There are several ways in which context-free grammars can be simplified. One natural way is to eliminate useless symbols those that cannot.
1 Lexicalized and Probabilistic Parsing – Part 1 ICS 482 Natural Language Processing Lecture 14: Lexicalized and Probabilistic Parsing – Part 1 Husni Al-Muhtaseb.
Chapter 12 Lexicalized and Probabilistic Parsing Guoqiang Shan University of Arizona November 30, 2006.
10. Lexicalized and Probabilistic Parsing -Speech and Language Processing- 발표자 : 정영임 발표일 :
September PROBABILISTIC CFGs & PROBABILISTIC PARSING Universita’ di Venezia 3 Ottobre 2003.
1 Statistical NLP: Lecture 12 Probabilistic Context Free Grammars.
PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.
6/9/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 11 Giuseppe Carenini.
Parsing with PCFG Ling 571 Fei Xia Week 3: 10/11-10/13/05.
CS Master – Introduction to the Theory of Computation Jan Maluszynski - HT Lecture 4 Context-free grammars Jan Maluszynski, IDA, 2007
Basic Parsing with Context- Free Grammars 1 Some slides adapted from Julia Hirschberg and Dan Jurafsky.
Syntactic Parsing with CFGs CMSC 723: Computational Linguistics I ― Session #7 Jimmy Lin The iSchool University of Maryland Wednesday, October 14, 2009.
Fall 2004 Lecture Notes #5 EECS 595 / LING 541 / SI 661 Natural Language Processing.
Parsing SLP Chapter 13. 7/2/2015 Speech and Language Processing - Jurafsky and Martin 2 Outline  Parsing with CFGs  Bottom-up, top-down  CKY parsing.
More on Text Management. Context Free Grammars Context Free Grammars are a more natural model for Natural Language Syntax rules are very easy to formulate.
Probabilistic Parsing Ling 571 Fei Xia Week 4: 10/18-10/20/05.
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books עיבוד שפות טבעיות - שיעור תשע Bottom Up Parsing עידו דגן.
Context-Free Grammar CSCI-GA.2590 – Lecture 3 Ralph Grishman NYU.
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
PARSING David Kauchak CS457 – Fall 2011 some slides adapted from Ray Mooney.
1 Basic Parsing with Context- Free Grammars Slides adapted from Dan Jurafsky and Julia Hirschberg.
BİL711 Natural Language Processing1 Statistical Parse Disambiguation Problem: –How do we disambiguate among a set of parses of a given sentence? –We want.
Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
1 Statistical Parsing Chapter 14 October 2012 Lecture #9.
December 2004CSA3050: PCFGs1 CSA305: Natural Language Algorithms Probabilistic Phrase Structure Grammars (PCFGs)
GRAMMARS David Kauchak CS159 – Fall 2014 some slides adapted from Ray Mooney.
SI485i : NLP Set 8 PCFGs and the CKY Algorithm. PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules.
Some Probability Theory and Computational models A short overview.
The CYK Algorithm Presented by Aalapee Patel Tyler Ondracek CS6800 Spring 2014.
Parsing I: Earley Parser CMSC Natural Language Processing May 1, 2003.
Membership problem CYK Algorithm Project presentation CS 5800 Spring 2013 Professor : Dr. Elise de Doncker Presented by : Savitha parur venkitachalam.
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
Page 1 Probabilistic Parsing and Treebanks L545 Spring 2000.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture August 2007.
Albert Gatt Corpora and Statistical Methods Lecture 11.
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 29– CYK; Inside Probability; Parse Tree construction) Pushpak Bhattacharyya CSE.
Sentence Parsing Parsing 3 Dynamic Programming. Jan 2009 Speech and Language Processing - Jurafsky and Martin 2 Acknowledgement  Lecture based on  Jurafsky.
PARSING 2 David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
CS 4705 Lecture 10 The Earley Algorithm. Review Top-Down vs. Bottom-Up Parsers –Both generate too many useless trees –Combine the two to avoid over-generation:
CPSC 422, Lecture 28Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 28 Nov, 18, 2015.
Natural Language Processing Lecture 15—10/15/2015 Jim Martin.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
December 2011CSA3202: PCFGs1 CSA3202: Human Language Technology Probabilistic Phrase Structure Grammars (PCFGs)
GRAMMARS David Kauchak CS457 – Spring 2011 some slides adapted from Ray Mooney.
Instructor: Nick Cercone CSEB - 1 Parsing and Context Free Grammars Parsers, Top Down, Bottom Up, Left Corner, Earley.
NLP. Introduction to NLP Time flies like an arrow –Many parses –Some (clearly) more likely than others –Need for a probabilistic ranking method.
/02/20161 Probabilistic Context Free Grammars Chris Brew Ohio State University.
PARSING David Kauchak CS159 – Fall Admin Assignment 3 Quiz #1  High: 36  Average: 33 (92%)  Median: 33.5 (93%)
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing SLR Parsing Muhammed Al-Mulhem March 1, 2009.
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 25– Probabilistic Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th March,
Speech and Language Processing SLP Chapter 13 Parsing.
Natural Language Processing : Probabilistic Context Free Grammars Updated 8/07.
Chapter 12: Probabilistic Parsing and Treebanks Heshaam Faili University of Tehran.
Roadmap Probabilistic CFGs –Handling ambiguity – more likely analyses –Adding probabilities Grammar Parsing: probabilistic CYK Learning probabilities:
Natural Language Processing Vasile Rus
1 Statistical methods in NLP Diana Trandabat
CSC 594 Topics in AI – Natural Language Processing
CSC 594 Topics in AI – Natural Language Processing
CSCI 5832 Natural Language Processing
Probabilistic and Lexicalized Parsing
CSCI 5832 Natural Language Processing
CS : Language Technology for the Web/Natural Language Processing
David Kauchak CS159 – Spring 2019
CPSC 503 Computational Linguistics
David Kauchak CS159 – Spring 2019
CPSC 503 Computational Linguistics
Presentation transcript:

March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1, 2009

2Dr. Muhammed Al-Mulhem SCFG Probabilistic CFG (PCFG) or Stochastic CFG (SCFG) is the simplest augmentation of the CFG. Probabilistic CFG (PCFG) or Stochastic CFG (SCFG) is the simplest augmentation of the CFG. A CFG G is defined as (N, Σ, R, S), a SCFG is also defined as (N, Σ, R, S), where A CFG G is defined as (N, Σ, R, S), a SCFG is also defined as (N, Σ, R, S), where N is a set of non-terminal symbols. N is a set of non-terminal symbols. Σ a set of terminal symbols, (N  ∑= Ø). Σ a set of terminal symbols, (N  ∑= Ø). R is a set of production rules, each of the form A → β [p], where R is a set of production rules, each of the form A → β [p], where A  N A  N β  (Σ  N)* β  (Σ  N)* P is a number between 0 and 1 expressing P(A → β ) P is a number between 0 and 1 expressing P(A → β ) S is the start symbol, S  N. S is the start symbol, S  N.

March 1, Dr. Muhammed Al-Mulhem SCFG P(A → β ) expresses the probability that the A will be expanded to β. P(A → β ) expresses the probability that the A will be expanded to β. If we consider all the possible expansions of a non-terminal A, the sum of their probabilities must be 1. If we consider all the possible expansions of a non-terminal A, the sum of their probabilities must be 1.

March 1, Dr. Muhammed Al-Mulhem Example 1 Attach probabilities to grammar rules Attach probabilities to grammar rules The sum of the probabilities of a given non- terminal, such as VP, is 1 The sum of the probabilities of a given non- terminal, such as VP, is 1 VP  Verb.55 VP  Verb NP.40 VP  Verb NP NP.05

March 1, Dr. Muhammed Al-Mulhem Example2 NP  Det N: 0.4 NP  NPposs N: 0.1 NP  Pronoun: 0.2 NP  NP PP: 0.1 NP  N: 0.2 NP Det NP N PP P(subtree above) = 0.1 x 0.4 = 0.04

March 1, Dr. Muhammed Al-Mulhem Example 3

March 1, Dr. Muhammed Al-Mulhem Example 3

March 1, Dr. Muhammed Al-Mulhem These Are the rules that generate the above trees (not the grammar)

March 1, Dr. Muhammed Al-Mulhem Example 3 The probabilities for the two parse trees are calculated by multiplying the probabilities of the production rules used to generate the parse tree. P(T1)=.15x.40x.05x.05x.35x.75x.40x.40x.30x.40x.50 = 3.78x10 -7 P(T2)=.15x.40x.40x.05x.05x.75x.40x.40x.30x.40x.50 = 4.32x10 -7

March 1, Dr. Muhammed Al-Mulhem Example 4 S → NP VP 1.0 S → NP VP 1.0 PP → P NP 1.0 PP → P NP 1.0 VP → V NP 0.7 VP → V NP 0.7 VP → VP PP 0.3 VP → VP PP 0.3 P → with 1.0 P → with 1.0 V → saw 1.0 V → saw 1.0  NP → NP PP 0.4  NP → astronomers 0.1  NP → ears 0.18  NP → saw 0.04  NP → stars 0.18  NP → telescopes 0.1

March 1, Dr. Muhammed Al-Mulhem Example 4: Astronomers saw stars with ears

March 1, Dr. Muhammed Al-Mulhem The probabilities of the two parse trees P(t 1 ) =1.0 x 0.1 x 0.7 x 1.0 x 0.4 x 0.18 x 1.0 x 1.0 x 0.18 = P(t 2 ) =1.0 x 0.1 x 0.3 x 0.7 x 1.0 x 0.18 x 1.0 x 1.0 x 0.18 =

March 1, Dr. Muhammed Al-Mulhem Example 4 S → NP VP 1.0 PP → P NP 1.0 VP → V NP 0.7 VP → VP PP 0.3 P → with 1.0 V → saw 1.0 NP → NP PP 0.4 NP → astronomers 0.1 NP → ears 0.18 NP → saw 0.04 NP → stars 0.18 NP → telescopes 0.1 P(t 1 ) =1.0 x 0.1 x 0.7 x 1.0 x 0.4 x 0.18 x 1.0 x 1.0 x 0.18 = P(t 2 ) =1.0 x 0.1 x 0.3 x 0.7 x 1.0 x 0.18 x 1.0 x 1.0 x 0.18 =

March 1, Dr. Muhammed Al-Mulhem Probabilistic CFGs The probabilistic model The probabilistic model Assigning probabilities to parse trees Assigning probabilities to parse trees Getting the probabilities for the model Getting the probabilities for the model Parsing with probabilities Parsing with probabilities Slight modification to dynamic programming approach Slight modification to dynamic programming approach Task is to find the max probability tree for an input Task is to find the max probability tree for an input

March 1, Dr. Muhammed Al-Mulhem Getting the Probabilities From an annotated database (a treebank) From an annotated database (a treebank) Learned from a corpus Learned from a corpus

March 1, Dr. Muhammed Al-Mulhem Treebank Get a large collection of parsed sentences. Get a large collection of parsed sentences. Collect counts for each non-terminal rule expansion in the collection. Collect counts for each non-terminal rule expansion in the collection. Normalize Normalize Done Done

March 1, Dr. Muhammed Al-Mulhem Learning What if you don’t have a treebank (and can’t get one). What if you don’t have a treebank (and can’t get one). Take a large collection of text and parse it. Take a large collection of text and parse it. In the case of syntactically ambiguous sentences collect all the possible parses. In the case of syntactically ambiguous sentences collect all the possible parses. Prorate the rule statistics gathered for rules in the ambiguous case by their probability. Prorate the rule statistics gathered for rules in the ambiguous case by their probability. Proceed as you did with a treebank. Proceed as you did with a treebank. Inside-Outside algorithm. Inside-Outside algorithm.

March 1, Dr. Muhammed Al-Mulhem Assumptions We’re assuming that there is a grammar to be used to parse with. We’re assuming that there is a grammar to be used to parse with. We’re assuming the existence of a large robust dictionary with parts of speech. We’re assuming the existence of a large robust dictionary with parts of speech. We’re assuming the ability to parse (i.e. a parser). We’re assuming the ability to parse (i.e. a parser). Given all that… we can parse probabilistically. Given all that… we can parse probabilistically.

March 1, Dr. Muhammed Al-Mulhem Typical Approach Bottom-up dynamic programming approach Bottom-up dynamic programming approach Assign probabilities to constituents as they are completed and placed in the table Assign probabilities to constituents as they are completed and placed in the table Use the max probability for each constituent going up. Use the max probability for each constituent going up.

March 1, Dr. Muhammed Al-Mulhem Max probability Say we’re talking about a final part of a parse Say we’re talking about a final part of a parse S 0  NP i VP j S 0  NP i VP j The probability of the S is… P(S  NP VP)*P(NP)*P(VP) The yellow part is already known. We’re doing bottom-up parsing The yellow part is already known. We’re doing bottom-up parsing

March 1, Dr. Muhammed Al-Mulhem Max The P(NP) is known. The P(NP) is known. What if there are multiple NPs for the span of text in question (0 to i)? What if there are multiple NPs for the span of text in question (0 to i)? Take the max (Why?) Take the max (Why?) Does not mean that other kinds of constituents for the same span are ignored (i.e. they might be in the solution) Does not mean that other kinds of constituents for the same span are ignored (i.e. they might be in the solution)

March 1, Dr. Muhammed Al-Mulhem Probabilistic Parsing Probabilistic CYK (Cocke-Younger-Kasami) algorithm for parsing PCFG Probabilistic CYK (Cocke-Younger-Kasami) algorithm for parsing PCFG Bottom-up dynamic programming algorithm Bottom-up dynamic programming algorithm Assume PCFG is in Chomsky Normal Form (production is either A → B C or A → a) Assume PCFG is in Chomsky Normal Form (production is either A → B C or A → a)

March 1, Dr. Muhammed Al-Mulhem Chomsky Normal Form (CNF) All rules have form: Non-Terminal and terminal

March 1, Dr. Muhammed Al-Mulhem 17 May Examples: Not Chomsky Normal Form Chomsky Normal Form

March 1, Dr. Muhammed Al-Mulhem 17 May Observations  Chomsky normal forms are good for parsing and proving theorems  It is possible to find the Chomsky normal form of any context-free grammar

March 1, Dr. Muhammed Al-Mulhem 17 May Probabilistic CYK Parsing of PCFGs CYK Algorithm: bottom-up parser CYK Algorithm: bottom-up parser Input: Input: A Chomsky normal form PCFG, G= (N, Σ, P, S, D) Assume that the N non-terminals have indices 1, 2, …, |N|, and the start symbol S has index 1 A Chomsky normal form PCFG, G= (N, Σ, P, S, D) Assume that the N non-terminals have indices 1, 2, …, |N|, and the start symbol S has index 1 n words w 1,…, w n n words w 1,…, w n Data Structure: Data Structure: A dynamic programming array π[i,j,a] holds the maximum probability for a constituent with non-terminal index a spanning words i..j. A dynamic programming array π[i,j,a] holds the maximum probability for a constituent with non-terminal index a spanning words i..j. Output: Output: The maximum probability parse π[1,n,1] The maximum probability parse π[1,n,1]

March 1, Dr. Muhammed Al-Mulhem 17 May Base Case CYK fills out π[i,j,a] by induction CYK fills out π[i,j,a] by induction Base case Base case Input strings with length = 1 (individual words w i ) Input strings with length = 1 (individual words w i ) In CNF, the probability of a given non-terminal A expanding to a single word w i must come only from the rule A → w i i.e., P(A → w i ) In CNF, the probability of a given non-terminal A expanding to a single word w i must come only from the rule A → w i i.e., P(A → w i )

March 1, Dr. Muhammed Al-Mulhem 17 May Probabilistic CYK Algorithm [ Corrected ] Function CYK(words, grammar) return the most probable parse and its probability For i ←1 to num_words for a ←1 to num_nonterminals If (A →w i ) is in grammar then π[i, i, a] ←P(A →w i ) For span ←2 to num_words For begin ←1 to num_words – span + 1 end ←begin + span – 1 For m ←begin to end – 1 For a ←1 to num_nonterminals For a ←1 to num_nonterminals For b ←1 to num_nonterminals For b ←1 to num_nonterminals For c ←1 to num_nonterminals For c ←1 to num_nonterminals prob ←π[begin, m, b] × π[m+1, end, c] × P(A →BC) prob ←π[begin, m, b] × π[m+1, end, c] × P(A →BC) If (prob > π[begin, end, a]) then If (prob > π[begin, end, a]) then π[begin, end, a] = prob back[begin, end, a] = {m, b, c} Return build_tree(back[1, num_words, 1]), π[1, num_words, 1]

March 1, Dr. Muhammed Al-Mulhem 17 May The CYK Membership Algorithm Input: Grammar in Chomsky Normal Form String Output: find if

March 1, Dr. Muhammed Al-Mulhem 17 May The Algorithm Grammar : String : Input example:

March 1, Dr. Muhammed Al-Mulhem 17 May All substrings of length 1 All substrings of length 2 All substrings of length 3 All substrings of length 4 All substrings of length 5

March 1, Dr. Muhammed Al-Mulhem 17 May

March 1, Dr. Muhammed Al-Mulhem 17 May

March 1, Dr. Muhammed Al-Mulhem 17 May Therefore:

March 1, Dr. Muhammed Al-Mulhem 17 May CYK Algorithm for Deciding Context Free Languages IDEA: For each substring of a given input x, find all variables which can derive the substring. Once these have been found, telling which variables generate x becomes a simple matter of looking at the grammar, since it’s in Chomsky normal form