Download presentation
Presentation is loading. Please wait.
1
Parsing with PCFG Ling 571 Fei Xia Week 3: 10/11-10/13/05
2
Outline Misc CYK algorithm Converting CFG into CNF PCFG Lexicalized PCFG
3
Misc Quiz 1: 15 pts, due 10/13 Hw2: 10 pts, due 10/13, ling580i_au05@u, ling580e_au05@u Treehouse weekly meeting: –Time: every Wed 2:30-3:30pm, tomorrow is the 1 st meeting –Location: EE1 025 (Campus map 12-N, South of MGH) –Mailing list: cl-announce@u Others: –Pongo policies –Machines: LLC, Parrington, Treehouse –Linux commands: ssh, sftp, … –Catalyst tools: ESubmit, EPost, …
4
CYK algorithm
5
Parsing algorithms Top-down Bottom-up Top-down with bottom-up filtering Earley algorithm CYK algorithm....
6
CYK algorithm Cocke-Younger-Kasami algorithm (a.k.a. CKY algorithm) Require CFG to be in Chomsky Normal Form (CNF). Bottom-up chart parsing algorithm using DP. Fill in a two-dimension array: C[i][j] contains all the possible syntactic interpretations of the substring Complexity:
7
Chomsky normal form (CNF) Definition of CNF: –A B C –A a –S A, B, C are non-terminals; a is a terminal. S is the start symbol; B and C are not. For every CFG, there is a CFG in CNF that is weakly equivalent.
8
CYK algorithm For every rule A w_i, For span=2 to N for begin=1 to N-span+1 end = begin + span – 1; for m=begin to end-1; for all non-terminals A, B, C: If then
9
CYK algorithm (another way) For every rule A w_i, add it to Cell[i][i] For span=2 to N for begin=1 to N-span+1 end = begin + span – 1; for m=begin to end-1; for all non-terminals A, B, C: If Cell[begin][m] contains B ... and Cell[m+1][end] contains C … and A BC is a rule in the grammar then add A BC to Cell[begin][end] and remember m
10
An example Rules: VP V NP V book VP VP PP N book/flight/cards NP Det N Det that/the NP NP PP P with PP P NP
11
Parse “book that flight”: C1[begin][end] VP V NP (m=1) NP Det N (m=2) N flight ----Det that N book V book begin=1 begin=2 begin=3 end=1 end=2 end=3
12
Parse “book that flight”: C2[begin][span] VP V NP (m=1) ----NP Det N (m=2) N book V book Det thatN flight begin=1 begin=2 begin=3 span=1 span=2 span=3
13
Data structures for the chart (1) (2) (3) (4)
14
Summary of CYK algorithm Bottom-up using DP Require the CFG to be in CNF A very efficient algorithm Easy to be extended
15
Converting CFG into CNF
16
Chomsky normal form (CNF) Definition of CNF: –A B C, –A a, –S Where A, B, C are non-terminals, a is a terminal, S is the start symbol, and B, C are not start symbols. For every CFG, there is a CFG in CNF that is weakly equivalent.
17
Converting CFG to CNF (1)Add a new symbol S0, and a rule S0 S (so the start symbol will not appear on the rhs of any rule) (2) Eliminate for each rule add for each rule, add unless has been previously eliminated.
18
Conversion (cont) (3) Remove unit rule add if unless the latter rule was previously removed. (4) Replace a rule where k>2 with replace any terminal with a new symbol and add a new rule
19
An example
20
Adding
21
Removing rules Remove B Remove A
22
Removing unit rules Remove
23
Removing unit rules (cont) Remove Removing
24
Converting remaining rules
25
Summary of CFG parsing Simply top-down and bottom-up parsing generate useless trees. Top-down with bottom-up filtering has three problems. Solution: use DP: –Earley algorithm –CYK algorithm
26
Probabilistic CFG (PCFG)
27
PCFG PCFG is an extension of CFG. A PCFG is a 5-tuple=(N, T, P, S, Pr), where Pr is a function assigning probability to each rule in P: or Given a non-terminal A,
28
A PCFG S NP VP 0.8 N Mary 0.01 S Aux NP VP 0.15 N book 0.02 S VP 0.05 VP V 0.35 V bought 0.02 VP V NP 0.45 VP VP PP 0.20 Det a 0.04 NP N 0.8 NP Det N 0.2 ….
29
Using probabilities To estimate prob of a sentence and its parse trees. Useful in disambiguation. The prob of a tree: n is a node in T, r(n) is the rule used to expand n in T.
30
Computing P(T) S NP VP 0.8 N Mary 0.01 S Aux NP VP 0.15 N book 0.02 S VP 0.05 VP V 0.35 V bought 0.02 VP V NP 0.45 VP VP PP 0.20 Det a 0.04 NP N 0.8 NP Det N 0.2 The sentence is “Mary bought a book”.
31
The most likely tree P(T, S) = P(T) * P(S|T) = P(T) T is a parse tree, S is a sentence The best parse tree for a sentence S
32
Find the most likely tree Given a PCFG and a sentence, how to find the best parse tree for S? One algorithm: CYK
33
CYK algorithm for CFG For every rule A w_i, For span=2 to N for begin=1 to N-span+1 end = begin + span – 1; for m=begin to end-1; for all non-terminals A, B, C: If then
34
CYK algorithm for CFG (another implementation) For every rule A w_i, For span=2 to N for begin=1 to N-span+1 end = begin + span – 1; for m=begin to end-1; for all non-terminals A, B, C: if then
35
Variables for CFG and PCFG CFG: whether there is a parse tree whose root is A and which covers PCFG: the prob of the most likely parse tree whose root is A and which covers
36
CYK algorithm for PCFG For every rule A w_i, For span=2 to N for begin=1 to N-span+1 end = begin + span – 1; for m=begin to end-1; for all non-terminals A, B, C: if then
37
A CFG Rules: VP V NP V book VP VP PP N book/flight/cards NP Det N Det that/the NP NP PP P with PP P NP
38
Parse “book that flight” VP V NP (m=1) NP Det N (m=2) N flight ----Det that N book V book begin=1 begin=2 begin=3 end=1 end=2 end=3
39
A PCFG Rules: VP V NP 0.4 V book 0.001 VP VP PP 0.2 N book 0.01 NP Det N 0.3 Det that 0.1 NP NP PP 0.2 P with 0.2 PP P NP 1.0 N flight 0.02
40
Parse “book that flight” VP V NP (m=1) 2.4e-7 NP Det N (m=2) 6e-4 N flight 0.02 ----Det that 0.1 N book 0.01 V book 0.001 begin=1 begin=2 begin=3 end=1 end=2 end=3
41
N-best parse trees Best parse tree: N-best parse trees:
42
CYK algorithm for N-best For every rule A w_i, For span=2 to N for begin=1 to N-span+1 end = begin + span – 1; for m=begin to end-1; for all non-terminals A, B, C: for each if val > one of probs in then remove the last element in and insert val to the array remove the last element in B[begin][end][A] and insert (m, B,C,i, j) to B[begin][end][A].
43
PCFG for Language Modeling (LM) N-gram LM: Syntax-based LM:
44
Calculating Pr(S) Parsing: the prob of the most likely parse tree LM: the sum of all parse trees
45
CYK for finding the most likely parse tree For every rule A w_i, For span=2 to N for begin=1 to N-span+1 end = begin + span – 1; for m=begin to end-1; for all non-terminals A, B, C: if then
46
CYK for calculating LM For every rule A w_i, For span=2 to N for begin=1 to N-span+1 end = begin + span – 1; for m=begin to end-1; for all non-terminals A, B, C:
47
CYK algorithm One parse treebooleantuple All parse treesbooleanlist of tuples Most likely parse tree real number (the max prob) tuple N-best parse trees list of real numberslist of tuples LM for sentence real number (the sum of probs) not needed
48
Learning PCFG Probabilities Given a treebank (i.e., a set of trees), use MLE: Without treebanks inside-outside algorithm
49
Q&A PCFG CYK algorithm
50
Problems of PCFG Lack of sensitivity to structural dependency: Lack of sensitivity to lexical dependency:
51
Structural Dependency Each PCFG rule is assumed to be independent of other rules. Observation: sometimes the choice of how a node expands is dependent on the location of the node in the parse tree. –NP Pron depends on whether the NP was a subject or an object
52
Lexical Dependency Given P(NP NP PP) > P(VP VP PP) should a PP always be attached to an NP? Verbs such as “send” Preps such as “of”, “into”
53
Solution to the problems Structural dependency Lexical dependency Other more sophisticated models.
54
Lexicalized PCFG
55
Head and head child Each syntactic constituent is associated with a lexical head. Each context-free rule has a head child: –VP V NP –NP Det N –VP VP PP –NP NP PP –VP to VP –VP aux VP
56
Head propagation Lexical head propagates from head child to its parent. An example: “Mary bought a book in the store.”
57
Lexicalized PCFG Lexicalized rules: –VP (bought) V(bought) NP 0.01 –VP V NP | 0.01 | 0 | bought - –VP (bought) V (bought) NP (book) 1.5e-7 –VP V NP | 1.5e-7 | 0 | bought book
58
Finding head in a parse tree Head propagation table: simple rules to find head child An example: –(VP left V/VP/Aux) –(PP left P) –(NP right N)
59
Simplified Model using Lexicalized PCFG PCFG: P(r(n)|n) Lexicalized PCFG: P(r(n)|n, head(n)) – P(VP VBD NP PP | VP, dumped) – P(VP VBD NP PP | VP, slept) Parsers that use lexicalized rules –Collins’ parser
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.