Presentation is loading. Please wait.

Presentation is loading. Please wait.

March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,

Similar presentations


Presentation on theme: "March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,"— Presentation transcript:

1 March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1, 2009

2 2Dr. Muhammed Al-Mulhem SCFG Probabilistic CFG (PCFG) or Stochastic CFG (SCFG) is the simplest augmentation of the CFG. Probabilistic CFG (PCFG) or Stochastic CFG (SCFG) is the simplest augmentation of the CFG. A CFG G is defined as (N, Σ, R, S), a SCFG is also defined as (N, Σ, R, S), where A CFG G is defined as (N, Σ, R, S), a SCFG is also defined as (N, Σ, R, S), where N is a set of non-terminal symbols. N is a set of non-terminal symbols. Σ a set of terminal symbols, (N  ∑= Ø). Σ a set of terminal symbols, (N  ∑= Ø). R is a set of production rules, each of the form A → β [p], where R is a set of production rules, each of the form A → β [p], where A  N A  N β  (Σ  N)* β  (Σ  N)* P is a number between 0 and 1 expressing P(A → β ) P is a number between 0 and 1 expressing P(A → β ) S is the start symbol, S  N. S is the start symbol, S  N.

3 March 1, 2009 3Dr. Muhammed Al-Mulhem SCFG P(A → β ) expresses the probability that the A will be expanded to β. P(A → β ) expresses the probability that the A will be expanded to β. If we consider all the possible expansions of a non-terminal A, the sum of their probabilities must be 1. If we consider all the possible expansions of a non-terminal A, the sum of their probabilities must be 1.

4 March 1, 2009 4Dr. Muhammed Al-Mulhem Example 1 Attach probabilities to grammar rules Attach probabilities to grammar rules The sum of the probabilities of a given non- terminal, such as VP, is 1 The sum of the probabilities of a given non- terminal, such as VP, is 1 VP  Verb.55 VP  Verb NP.40 VP  Verb NP NP.05

5 March 1, 2009 5Dr. Muhammed Al-Mulhem Example2 NP  Det N: 0.4 NP  NPposs N: 0.1 NP  Pronoun: 0.2 NP  NP PP: 0.1 NP  N: 0.2 NP Det NP N PP P(subtree above) = 0.1 x 0.4 = 0.04

6 March 1, 2009 6Dr. Muhammed Al-Mulhem Example 3

7 March 1, 2009 7Dr. Muhammed Al-Mulhem Example 3

8 March 1, 2009 8Dr. Muhammed Al-Mulhem These Are the rules that generate the above trees (not the grammar)

9 March 1, 2009 9Dr. Muhammed Al-Mulhem Example 3 The probabilities for the two parse trees are calculated by multiplying the probabilities of the production rules used to generate the parse tree. P(T1)=.15x.40x.05x.05x.35x.75x.40x.40x.30x.40x.50 = 3.78x10 -7 P(T2)=.15x.40x.40x.05x.05x.75x.40x.40x.30x.40x.50 = 4.32x10 -7

10 March 1, 2009 10Dr. Muhammed Al-Mulhem Example 4 S → NP VP 1.0 S → NP VP 1.0 PP → P NP 1.0 PP → P NP 1.0 VP → V NP 0.7 VP → V NP 0.7 VP → VP PP 0.3 VP → VP PP 0.3 P → with 1.0 P → with 1.0 V → saw 1.0 V → saw 1.0  NP → NP PP 0.4  NP → astronomers 0.1  NP → ears 0.18  NP → saw 0.04  NP → stars 0.18  NP → telescopes 0.1

11 March 1, 2009 11Dr. Muhammed Al-Mulhem Example 4: Astronomers saw stars with ears

12 March 1, 2009 12Dr. Muhammed Al-Mulhem The probabilities of the two parse trees P(t 1 ) =1.0 x 0.1 x 0.7 x 1.0 x 0.4 x 0.18 x 1.0 x 1.0 x 0.18 = 0.0009072 P(t 2 ) =1.0 x 0.1 x 0.3 x 0.7 x 1.0 x 0.18 x 1.0 x 1.0 x 0.18 = 0.0006804

13 March 1, 2009 13Dr. Muhammed Al-Mulhem Example 4 S → NP VP 1.0 PP → P NP 1.0 VP → V NP 0.7 VP → VP PP 0.3 P → with 1.0 V → saw 1.0 NP → NP PP 0.4 NP → astronomers 0.1 NP → ears 0.18 NP → saw 0.04 NP → stars 0.18 NP → telescopes 0.1 P(t 1 ) =1.0 x 0.1 x 0.7 x 1.0 x 0.4 x 0.18 x 1.0 x 1.0 x 0.18 = 0.0009072 P(t 2 ) =1.0 x 0.1 x 0.3 x 0.7 x 1.0 x 0.18 x 1.0 x 1.0 x 0.18 = 0.0006804

14 March 1, 2009 14Dr. Muhammed Al-Mulhem Probabilistic CFGs The probabilistic model The probabilistic model Assigning probabilities to parse trees Assigning probabilities to parse trees Getting the probabilities for the model Getting the probabilities for the model Parsing with probabilities Parsing with probabilities Slight modification to dynamic programming approach Slight modification to dynamic programming approach Task is to find the max probability tree for an input Task is to find the max probability tree for an input

15 March 1, 2009 15Dr. Muhammed Al-Mulhem Getting the Probabilities From an annotated database (a treebank) From an annotated database (a treebank) Learned from a corpus Learned from a corpus

16 March 1, 2009 16Dr. Muhammed Al-Mulhem Treebank Get a large collection of parsed sentences. Get a large collection of parsed sentences. Collect counts for each non-terminal rule expansion in the collection. Collect counts for each non-terminal rule expansion in the collection. Normalize Normalize Done Done

17 March 1, 2009 17Dr. Muhammed Al-Mulhem Learning What if you don’t have a treebank (and can’t get one). What if you don’t have a treebank (and can’t get one). Take a large collection of text and parse it. Take a large collection of text and parse it. In the case of syntactically ambiguous sentences collect all the possible parses. In the case of syntactically ambiguous sentences collect all the possible parses. Prorate the rule statistics gathered for rules in the ambiguous case by their probability. Prorate the rule statistics gathered for rules in the ambiguous case by their probability. Proceed as you did with a treebank. Proceed as you did with a treebank. Inside-Outside algorithm. Inside-Outside algorithm.

18 March 1, 2009 18Dr. Muhammed Al-Mulhem Assumptions We’re assuming that there is a grammar to be used to parse with. We’re assuming that there is a grammar to be used to parse with. We’re assuming the existence of a large robust dictionary with parts of speech. We’re assuming the existence of a large robust dictionary with parts of speech. We’re assuming the ability to parse (i.e. a parser). We’re assuming the ability to parse (i.e. a parser). Given all that… we can parse probabilistically. Given all that… we can parse probabilistically.

19 March 1, 2009 19Dr. Muhammed Al-Mulhem Typical Approach Bottom-up dynamic programming approach Bottom-up dynamic programming approach Assign probabilities to constituents as they are completed and placed in the table Assign probabilities to constituents as they are completed and placed in the table Use the max probability for each constituent going up. Use the max probability for each constituent going up.

20 March 1, 2009 20Dr. Muhammed Al-Mulhem Max probability Say we’re talking about a final part of a parse Say we’re talking about a final part of a parse S 0  NP i VP j S 0  NP i VP j The probability of the S is… P(S  NP VP)*P(NP)*P(VP) The yellow part is already known. We’re doing bottom-up parsing The yellow part is already known. We’re doing bottom-up parsing

21 March 1, 2009 21Dr. Muhammed Al-Mulhem Max The P(NP) is known. The P(NP) is known. What if there are multiple NPs for the span of text in question (0 to i)? What if there are multiple NPs for the span of text in question (0 to i)? Take the max (Why?) Take the max (Why?) Does not mean that other kinds of constituents for the same span are ignored (i.e. they might be in the solution) Does not mean that other kinds of constituents for the same span are ignored (i.e. they might be in the solution)

22 March 1, 2009 22Dr. Muhammed Al-Mulhem Probabilistic Parsing Probabilistic CYK (Cocke-Younger-Kasami) algorithm for parsing PCFG Probabilistic CYK (Cocke-Younger-Kasami) algorithm for parsing PCFG Bottom-up dynamic programming algorithm Bottom-up dynamic programming algorithm Assume PCFG is in Chomsky Normal Form (production is either A → B C or A → a) Assume PCFG is in Chomsky Normal Form (production is either A → B C or A → a)

23 March 1, 2009 23Dr. Muhammed Al-Mulhem Chomsky Normal Form (CNF) All rules have form: Non-Terminal and terminal

24 March 1, 2009 24Dr. Muhammed Al-Mulhem 17 May 201524 Examples: Not Chomsky Normal Form Chomsky Normal Form

25 March 1, 2009 25Dr. Muhammed Al-Mulhem 17 May 201525 Observations  Chomsky normal forms are good for parsing and proving theorems  It is possible to find the Chomsky normal form of any context-free grammar

26 March 1, 2009 26Dr. Muhammed Al-Mulhem 17 May 201526 Probabilistic CYK Parsing of PCFGs CYK Algorithm: bottom-up parser CYK Algorithm: bottom-up parser Input: Input: A Chomsky normal form PCFG, G= (N, Σ, P, S, D) Assume that the N non-terminals have indices 1, 2, …, |N|, and the start symbol S has index 1 A Chomsky normal form PCFG, G= (N, Σ, P, S, D) Assume that the N non-terminals have indices 1, 2, …, |N|, and the start symbol S has index 1 n words w 1,…, w n n words w 1,…, w n Data Structure: Data Structure: A dynamic programming array π[i,j,a] holds the maximum probability for a constituent with non-terminal index a spanning words i..j. A dynamic programming array π[i,j,a] holds the maximum probability for a constituent with non-terminal index a spanning words i..j. Output: Output: The maximum probability parse π[1,n,1] The maximum probability parse π[1,n,1]

27 March 1, 2009 27Dr. Muhammed Al-Mulhem 17 May 201527 Base Case CYK fills out π[i,j,a] by induction CYK fills out π[i,j,a] by induction Base case Base case Input strings with length = 1 (individual words w i ) Input strings with length = 1 (individual words w i ) In CNF, the probability of a given non-terminal A expanding to a single word w i must come only from the rule A → w i i.e., P(A → w i ) In CNF, the probability of a given non-terminal A expanding to a single word w i must come only from the rule A → w i i.e., P(A → w i )

28 March 1, 2009 28Dr. Muhammed Al-Mulhem 17 May 201528 Probabilistic CYK Algorithm [ Corrected ] Function CYK(words, grammar) return the most probable parse and its probability For i ←1 to num_words for a ←1 to num_nonterminals If (A →w i ) is in grammar then π[i, i, a] ←P(A →w i ) For span ←2 to num_words For begin ←1 to num_words – span + 1 end ←begin + span – 1 For m ←begin to end – 1 For a ←1 to num_nonterminals For a ←1 to num_nonterminals For b ←1 to num_nonterminals For b ←1 to num_nonterminals For c ←1 to num_nonterminals For c ←1 to num_nonterminals prob ←π[begin, m, b] × π[m+1, end, c] × P(A →BC) prob ←π[begin, m, b] × π[m+1, end, c] × P(A →BC) If (prob > π[begin, end, a]) then If (prob > π[begin, end, a]) then π[begin, end, a] = prob back[begin, end, a] = {m, b, c} Return build_tree(back[1, num_words, 1]), π[1, num_words, 1]

29 March 1, 2009 29Dr. Muhammed Al-Mulhem 17 May 201529 The CYK Membership Algorithm Input: Grammar in Chomsky Normal Form String Output: find if

30 March 1, 2009 30Dr. Muhammed Al-Mulhem 17 May 201530 The Algorithm Grammar : String : Input example:

31 March 1, 2009 31Dr. Muhammed Al-Mulhem 17 May 201531 All substrings of length 1 All substrings of length 2 All substrings of length 3 All substrings of length 4 All substrings of length 5

32 March 1, 2009 32Dr. Muhammed Al-Mulhem 17 May 201532

33 March 1, 2009 33Dr. Muhammed Al-Mulhem 17 May 201533

34 March 1, 2009 34Dr. Muhammed Al-Mulhem 17 May 201534 Therefore:

35 March 1, 2009 35Dr. Muhammed Al-Mulhem 17 May 201535 CYK Algorithm for Deciding Context Free Languages IDEA: For each substring of a given input x, find all variables which can derive the substring. Once these have been found, telling which variables generate x becomes a simple matter of looking at the grammar, since it’s in Chomsky normal form


Download ppt "March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,"

Similar presentations


Ads by Google