Download presentation
Presentation is loading. Please wait.
1
Probabilistic and Lexicalized Parsing
2
Probabilistic CFGs Weighted CFGs
Attach weights to rules of CFG Compute weights of derivations Use weights to pick, preferred parses Utility: Pruning and ordering the search space, disambiguate, Language Model for ASR. Parsing with weighted grammars (like Weighted FA) T* = arg maxT W(T,S) Probabilistic CFGs are one form of weighted CFGs.
3
Probability Model Rule Probability: R1: VP V .55
Attach probabilities to grammar rules Expansions for a given non-terminal sum to 1 R1: VP V .55 R2: VP V NP .40 R3: VP V NP NP .05 Estimate the probabilities from annotated corpora P(R1)=counts(R1)/counts(VP) Derivation Probability: Derivation T= {R1…Rn} Probability of a derivation: Most likely probable parse: Probability of a sentence: Sum over all possible derivations for the sentence Note the independence assumption: Parse probability does not change based on where the rule is expanded.
4
Structural ambiguity S NP VP VP V NP NP NP PP VP VP PP
PP P NP NP John | Mary | Denver V -> called P -> from John called Mary from Denver S VP PP NP V P John called Mary from Denver S NP VP NP V NP PP John called Mary P NP from Denver
5
Cocke-Younger-Kasami Parser
Bottom-up parser with top-down filtering Start State(s): (A, i, i+1) for each Awi+1 End State: (S, 0,n) n is the input size Next State Rules (B, i, k) (C, k, j) (A, i, j) if ABC
6
Example John called Mary from Denver
7
Base Case: Aw NP P Denver from V Mary called John
8
Recursive Cases: ABC NP P Denver from X V Mary called John
9
NP P Denver VP from X V Mary called John
10
NP X P Denver VP from V Mary called John
11
PP NP X P Denver VP from V Mary called John
12
PP NP X P Denver S VP from V Mary called John
13
PP NP X P Denver S VP from V Mary called John
14
NP PP X P Denver S VP from V Mary called John
15
NP PP X P Denver S VP from V Mary called John
16
VP NP PP X P Denver S from V Mary called John
17
VP NP PP X P Denver S from V Mary called John
18
VP1 VP2 NP PP X P Denver S VP from V Mary called John
19
S VP1 VP2 NP PP X P Denver VP from V Mary called John
20
S VP NP PP X P Denver from V Mary called John
21
Probabilistic CKY Assign probabilities to constituents as they are completed and placed in the table Computing the probability Since we are interested in the max P(S,0,n) Use the max probability for each constituent Maintain back-pointers to recover the parse.
22
Problems with PCFGs The probability model we’re using is just based on the rules in the derivation. Lexical insensitivity: Doesn’t use the words in any real way Structural disambiguation is lexically driven PP attachment often depends on the verb, its object, and the preposition I ate pickles with a fork. I ate pickles with relish. Context insensitivity of the derivation Doesn’t take into account where in the derivation a rule is used Pronouns more often subjects than objects She hates Mary. Mary hates her. Solution: Lexicalization Add lexical information to each rule
23
An example of lexical information: Heads
Make use of notion of the head of a phrase Head of an NP is a noun Head of a VP is the main verb Head of a PP is its preposition Each LHS of a rule in the PCFG has a lexical item Each RHS non-terminal has a lexical item. One of the lexical items is shared with the LHS. If R is the number of binary branching rules in CFG, in lexicalized CFG: O(2*|∑|*|R|) Unary rules: O(|∑|*|R|)
24
Example (correct parse)
Attribute grammar
25
Example (less preferred)
26
Computing Lexicalized Rule Probabilities
We started with rule probabilities VP V NP PP P(rule|VP) E.g., count of this rule divided by the number of VPs in a treebank Now we want lexicalized probabilities VP(dumped) V(dumped) NP(sacks)PP(in) P(rule|VP ^ dumped is the verb ^ sacks is the head of the NP ^ in is the head of the PP) Not likely to have significant counts in any treebank Back-off to lesser contexts until reliable estimates
27
Another Example Consider the VPs
Ate spaghetti with gusto Ate spaghetti with marinara Dependency is not between mother-child. Vp (ate) Vp(ate) Np(spag) Vp(ate) Pp(with) np Pp(with) v v np Ate spaghetti with marinara Ate spaghetti with gusto
28
Log-linear models for Parsing
Why restrict to the conditioning to the elements of a rule? Use even larger context Word sequence, word types, sub-tree context etc. In general, compute P(y|x); where fi(x,y) test the properties of the context; li is the weight of that feature. Use these as scores in the CKY algorithm to find the best scoring parse.
29
Parsing as sequential decision making process
Parsing: A series of decisions Lexical category label, structural attachment, phrasal category label Each decision is trained using some context as a classification task Classification techniques (SVM, MaxEnt, Decision Trees) can be used to train these decision classifiers. Context could depend on previous decisions (CKY-style decoding) CFGs can be recognized using Push Down Automata (PDA) Probabilistic extensions of PDA
30
Supertagging: Almost parsing
Poachers now control the underground trade S NP VP V control e N poachers VP Adv now S NP VP V Adj underground e NP N trade NP N poachers S Adv now N trade S VP V NP control N Adj underground NP Det the S S NP VP VP Adv now S NP VP V N trade e NP S S NP VP V control V NP NP VP e N e V NP poachers : e Adj : underground : Selecting the correct supertag for a word is almost parsing Use classifiers to select the correct supertag
31
Summary Parsing context-free grammars
Top-down and Bottom-up parsers Mixed approaches (CKY, Earley parsers) Preferences over parses using probabilities Parsing with PCFG and PCKY algorithms Enriching the probability model Lexicalization Log-linear models for parsing Classification techniques for parsing decisions
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.