Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS626-449: Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-16: Probabilistic parsing; computing probability of.

Similar presentations


Presentation on theme: "CS626-449: Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-16: Probabilistic parsing; computing probability of."— Presentation transcript:

1 CS626-449: Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-16: Probabilistic parsing; computing probability of a sentence

2 N-gram vs. PCFG

3 Chain Rule P(w 1,w 2,…,w n ) = P(w 1 ) * P(w 2 /w 1 ) * P(w 3 /w 1-2 ) *…* P(w n /w 1-(n-1) ) where, P(w n /w 1-(n-1) ) = # / # # denotes “number of”

4 Unigram & Bigram Probability Unigram –P(w 1 ) = # / #words Bigram –P(w 2 /w 1 ) = # / #

5 PCFG Prob (P) (Corpus) CFG (human)

6 N-Gram v/s PCFG N-gramPCFG P(w 1,m ) = P(w 1 ) * P(w 2 /w 1 ) * P(w 3 /w 1-2 ) *…* P(w n /w 1, m-1 ) P(w 1,m ) = ∑ t P(w 1,m, t) – Marginalisation

7 Compare P(w 1,m ) = P(w 1 ) * i=1 π i=m P(w i /w 1, n-i )  Statistics (Speech) P(w 1,m ) = ∑ t all parses P(t)  Statistics + Linguistics w 1,m = yield(s)  linguistics

8 Example of Sentence labeling: Parsing [ S1 [ S [ S [ VP [ VB Come][ NP [ NNP July]]]] [,,] [ CC and] [ S [ NP [ DT the] [ JJ UJF] [ NN campus]] [ VP [ AUX is] [ ADJP [ JJ abuzz] [ PP [ IN with] [ NP [ ADJP [ JJ new] [ CC and] [ VBG returning]] [ NNS students]]]]]] [..]]]

9 Rule Probabilities  Rule probabilities are such that E.g., P( NP  DT NN) = 0.2 P( NP  NN) = 0.5 P( NP  NP PP) = 0.3  P( NP  DT NN) = 0.2  Means 20 % of the training data parses use the rule NP  DT NN

10 Probability of a sentence Notation : –w ab – subsequence w a ….w b –N j dominates w a ….w b or yield(N j ) = w a ….w b w a ……………..w b NjNj Where t is a parse tree of the sentence the..sweet..teddy..bear NP Probability of a sentence = P(w 1m ) If t is a parse tree for the sentence w 1m, this will be 1 !!

11 Assumptions of the PCFG model Place invariance : P(NP  DT NN) is same in locations 1 and 2 Context-free : P(NP  DT NN | anything outside “The child”) = P(NP  DT NN) Ancestor free : At 2, P(NP  DT NN|its ancestor is VP) = P(NP  DT NN) S NP The child VP NP The child 1 2

12 Probability of a parse tree Domination :We say N j dominates from k to l, symbolized as, if W k,l is derived from N j P (tree |sentence) = P (tree | S 1,l ) where S 1,l means that the start symbol S dominates the word sequence W 1,l P (t |s) approximately equals joint probability of constituent non-terminals dominating the sentence fragments (next slide)

13 Probability of a parse tree (cont.) S 1,l NP 1,2 VP 3,l N 2,2 V 3,3 PP 4,l P 4,4 NP 5,l w2w2 w4w4 DT 1 w1w1 w3w3 w5w5 wlwl P ( t|s ) = P (t | S 1,l ) = P ( NP 1,2, DT 1,1, w 1, N 2,2, w 2, VP 3,l, V 3,3, w 3, PP 4,l, P 4,4, w 4, NP 5,l, w 5…l | S 1,l ) = P ( NP 1,2, VP 3,l | S 1,l ) * P ( DT 1,1, N 2,2 | NP 1,2 ) * D(w 1 | DT 1,1 ) * P (w 2 | N 2,2 ) * P (V 3,3, PP 4,l | VP 3,l ) * P(w 3 | V 3,3 ) * P( P 4,4, NP 5,l | PP 4,l ) * P(w 4 |P 4,4 ) * P (w 5…l | NP 5,l ) (Using Chain Rule, Context Freeness and Ancestor Freeness )

14 Example PCFG Rules & Probabilities S  NP VP1.0 NP  DT NN0.5 NP  NNS0.3 NP  NP PP 0.2 PP  P NP1.0 VP  VP PP 0.6 VP  VBD NP0.4 DT  the1.0 NN  gunman0.5 NN  building0.5 VBD  sprayed 1.0 NNS  bullets1.0

15 Example Parse t 1` The gunman sprayed the building with bullets. S 1.0 NP 0.5 VP 0.6 DT 1.0 NN 0.5 VBD 1.0 NP 0.5 PP 1.0 DT 1.0 NN 0.5 P 1.0 NP 0.3 NNS 1.0 bullets with buildingthe Thegunman sprayed P (t 1 ) = 1.0 * 0.5 * 1.0 * 0.5 * 0.6 * 0.4 * 1.0 * 0.5 * 1.0 * 0.5 * 1.0 * 1.0 * 0.3 * 1.0 = 0.00225 VP 0.4

16 Another Parse t 2 S 1.0 NP 0.5 VP 0.4 DT 1.0 NN 0.5 VBD 1.0 NP 0.5 PP 1.0 DT 1.0 NN 0.5 P 1.0 NP 0.3 NNS 1.0 bullets with buildingthe Thegunmansprayed NP 0.2 P (t 2 ) = 1.0 * 0.5 * 1.0 * 0.5 * 0.4 * 1.0 * 0.2 * 0.5 * 1.0 * 0.5 * 1.0 * 1.0 * 0.3 * 1.0 = 0.0015 The gunman sprayed the building with bullets.


Download ppt "CS626-449: Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-16: Probabilistic parsing; computing probability of."

Similar presentations


Ads by Google