Presentation is loading. Please wait.

Presentation is loading. Please wait.

Probabilistic and Lexicalized Parsing. Probabilistic CFGs Weighted CFGs –Attach weights to rules of CFG –Compute weights of derivations –Use weights to.

Similar presentations


Presentation on theme: "Probabilistic and Lexicalized Parsing. Probabilistic CFGs Weighted CFGs –Attach weights to rules of CFG –Compute weights of derivations –Use weights to."— Presentation transcript:

1 Probabilistic and Lexicalized Parsing

2 Probabilistic CFGs Weighted CFGs –Attach weights to rules of CFG –Compute weights of derivations –Use weights to pick, preferred parses Utility: Pruning and ordering the search space, disambiguate, Language Model for ASR. Parsing with weighted grammars (like Weighted FA) –T* = arg max T W(T,S) Probabilistic CFGs are one form of weighted CFGs.

3 Probability Model Rule Probability: –Attach probabilities to grammar rules –Expansions for a given non-terminal sum to 1 R1: VP  V.55 R2: VP  V NP.40 R3: VP  V NP NP.05 –Estimate the probabilities from annotated corpora P(R1)=counts(R1)/counts(VP) Derivation Probability: –Derivation T= {R 1 …R n } –Probability of a derivation: –Most likely probable parse: –Probability of a sentence: Sum over all possible derivations for the sentence Note the independence assumption: Parse probability does not change based on where the rule is expanded.

4 Structural ambiguity S  NP VP VP  V NP NP  NP PP VP  VP PP PP  P NP NP  John | Mary | Denver V -> called P -> from John called Mary from Denver S VP PP NP VP VNP P John called Mary from Denver S NP VP VNP PP P John called Mary fromDenver NP

5 Cocke-Younger-Kasami Parser Bottom-up parser with top-down filtering Start State(s): (A, i, i+1) for each A  w i+1 End State: (S, 0,n) n is the input size Next State Rules –(B  i, k) (C, k, j)  (A, i,  j) if A  BC

6 Example JohncalledMaryfromDenver

7 Base Case: A  w NP PDenver NPfrom VMary NPcalled John

8 Recursive Cases: A  BC NP PDenver NPfrom XVMary NPcalled John

9 NP PDenver VPNPfrom XVMary NPcalled John

10 NP XPDenver VPNPfrom XVMary NPcalled John

11 PPNP XPDenver VPNPfrom XVMary NPcalled John

12 PPNP XPDenver SVPNPfrom VMary NPcalled John

13 PPNP XXPDenver SVPNPfrom XVMary NPcalled John

14 NPPPNP XPDenver SVPNPfrom XVMary NPcalled John

15 NPPPNP XXXPDenver SVPNPfrom XVMary NPcalled John

16 VPNPPPNP XXXPDenver SVPNPfrom XVMary NPcalled John

17 VPNPPPNP XXXPDenver SVPNPfrom XVMary NPcalled John

18 VP 1 VP 2 NPPPNP XXXPDenver SVPNPfrom XVMary NPcalled John

19 SVP 1 VP 2 NPPPNP XXXPDenver SVPNPfrom XVMary NPcalled John

20 SVPNPPPNP XXXPDenver SVPNPfrom XVMary NPcalled John

21 Probabilistic CKY Assign probabilities to constituents as they are completed and placed in the table Computing the probability –Since we are interested in the max P(S,0,n) Use the max probability for each constituent Maintain back-pointers to recover the parse.

22 Problems with PCFGs The probability model we’re using is just based on the rules in the derivation. Lexical insensitivity: –Doesn’t use the words in any real way –Structural disambiguation is lexically driven PP attachment often depends on the verb, its object, and the preposition I ate pickles with a fork. I ate pickles with relish. Context insensitivity of the derivation –Doesn’t take into account where in the derivation a rule is used Pronouns more often subjects than objects She hates Mary. Mary hates her. Solution: Lexicalization –Add lexical information to each rule

23 An example of lexical information: Heads Make use of notion of the head of a phrase –Head of an NP is a noun –Head of a VP is the main verb –Head of a PP is its preposition Each LHS of a rule in the PCFG has a lexical item Each RHS non-terminal has a lexical item. –One of the lexical items is shared with the LHS. If R is the number of binary branching rules in CFG, in lexicalized CFG: O(2*|∑|*|R|) Unary rules: O(|∑|*|R|)

24 Example (correct parse) Attribute grammar

25 Example (less preferred)

26 Computing Lexicalized Rule Probabilities We started with rule probabilities –VP  V NP PP P(rule|VP) E.g., count of this rule divided by the number of VPs in a treebank Now we want lexicalized probabilities –VP(dumped)  V(dumped) NP(sacks)PP(in) –P(rule|VP ^ dumped is the verb ^ sacks is the head of the NP ^ in is the head of the PP) –Not likely to have significant counts in any treebank

27 Another Example Consider the VPs –Ate spaghetti with gusto –Ate spaghetti with marinara Dependency is not between mother-child. Vp (ate) Pp(with) v Ate spaghetti with gusto np Vp(ate) Pp(with) Np(spag) np v Ate spaghetti with marinara

28 Log-linear models for Parsing Why restrict to the conditioning to the elements of a rule? –Use even larger context –Word sequence, word types, sub-tree context etc. In general, compute P(y|x); where f i (x,y) test the properties of the context; i is the weight of that feature. Use these as scores in the CKY algorithm to find the best scoring parse.

29 Supertagging: Almost parsing Poachers now control the underground trade NP N poachers N NN trade S NP VP V NP N poachers  :::: S SAdv now VP Adv now VP AdvVP now :::: S S VP V NP control S NP VP V NP control S NP VP V NP control  S NP Det the NP N trade N NN poachers S NP VP V NP N trade  N NAdj underground S NP VP V NP Adj underground  S NP VP V NP Adj underground  S NP  :

30 Summary Parsing context-free grammars –Top-down and Bottom-up parsers –Mixed approaches (CKY, Earley parsers) Preferences over parses using probabilities –Parsing with PCFG and PCKY algorithms Enriching the probability model –Lexicalization –Log-linear models for parsing


Download ppt "Probabilistic and Lexicalized Parsing. Probabilistic CFGs Weighted CFGs –Attach weights to rules of CFG –Compute weights of derivations –Use weights to."

Similar presentations


Ads by Google