Parsing with PCFG Ling 571 Fei Xia Week 3: 10/11-10/13/05.

Parsing with PCFG Ling 571 Fei Xia Week 3: 10/11-10/13/05

Outline Misc CYK algorithm Converting CFG into CNF PCFG Lexicalized PCFG

Misc Quiz 1: 15 pts, due 10/13 Hw2: 10 pts, due 10/13, ling580i_au05@u, ling580e_au05@u Treehouse weekly meeting: –Time: every Wed 2:30-3:30pm, tomorrow is the 1 st meeting –Location: EE1 025 (Campus map 12-N, South of MGH) –Mailing list: cl-announce@u Others: –Pongo policies –Machines: LLC, Parrington, Treehouse –Linux commands: ssh, sftp, … –Catalyst tools: ESubmit, EPost, …

CYK algorithm

Parsing algorithms Top-down Bottom-up Top-down with bottom-up filtering Earley algorithm CYK algorithm....

CYK algorithm Cocke-Younger-Kasami algorithm (a.k.a. CKY algorithm) Require CFG to be in Chomsky Normal Form (CNF). Bottom-up chart parsing algorithm using DP. Fill in a two-dimension array: C[i][j] contains all the possible syntactic interpretations of the substring Complexity:

Chomsky normal form (CNF) Definition of CNF: –A  B C –A  a –S  A, B, C are non-terminals; a is a terminal. S is the start symbol; B and C are not. For every CFG, there is a CFG in CNF that is weakly equivalent.

CYK algorithm For every rule A  w_i, For span=2 to N for begin=1 to N-span+1 end = begin + span – 1; for m=begin to end-1; for all non-terminals A, B, C: If then

CYK algorithm (another way) For every rule A  w_i, add it to Cell[i][i] For span=2 to N for begin=1 to N-span+1 end = begin + span – 1; for m=begin to end-1; for all non-terminals A, B, C: If Cell[begin][m] contains B ... and Cell[m+1][end] contains C  … and A  BC is a rule in the grammar then add A  BC to Cell[begin][end] and remember m

An example Rules: VP  V NP V  book VP  VP PP N  book/flight/cards NP  Det N Det  that/the NP  NP PP P  with PP  P NP

Parse “book that flight”: C1[begin][end] VP  V NP (m=1) NP  Det N (m=2) N  flight ----Det  that N  book V  book begin=1 begin=2 begin=3 end=1 end=2 end=3

Parse “book that flight”: C2[begin][span] VP  V NP (m=1) ----NP  Det N (m=2) N  book V  book Det  thatN  flight begin=1 begin=2 begin=3 span=1 span=2 span=3

Data structures for the chart (1) (2) (3) (4)

Summary of CYK algorithm Bottom-up using DP Require the CFG to be in CNF A very efficient algorithm Easy to be extended

Converting CFG into CNF

Chomsky normal form (CNF) Definition of CNF: –A  B C, –A  a, –S  Where A, B, C are non-terminals, a is a terminal, S is the start symbol, and B, C are not start symbols. For every CFG, there is a CFG in CNF that is weakly equivalent.

Converting CFG to CNF (1)Add a new symbol S0, and a rule S0  S (so the start symbol will not appear on the rhs of any rule) (2) Eliminate for each rule add for each rule, add unless has been previously eliminated.

Conversion (cont) (3) Remove unit rule add if unless the latter rule was previously removed. (4) Replace a rule where k>2 with replace any terminal with a new symbol and add a new rule

An example

Adding

Removing rules Remove B  Remove A 

Removing unit rules Remove

Removing unit rules (cont) Remove Removing

Converting remaining rules

Summary of CFG parsing Simply top-down and bottom-up parsing generate useless trees. Top-down with bottom-up filtering has three problems. Solution: use DP: –Earley algorithm –CYK algorithm

Probabilistic CFG (PCFG)

PCFG PCFG is an extension of CFG. A PCFG is a 5-tuple=(N, T, P, S, Pr), where Pr is a function assigning probability to each rule in P: or Given a non-terminal A,

A PCFG S  NP VP 0.8 N  Mary 0.01 S  Aux NP VP 0.15 N  book 0.02 S  VP 0.05 VP  V 0.35 V  bought 0.02 VP  V NP 0.45 VP  VP PP 0.20 Det  a 0.04 NP  N 0.8 NP  Det N 0.2 ….

Using probabilities To estimate prob of a sentence and its parse trees. Useful in disambiguation. The prob of a tree: n is a node in T, r(n) is the rule used to expand n in T.

Computing P(T) S  NP VP 0.8 N  Mary 0.01 S  Aux NP VP 0.15 N  book 0.02 S  VP 0.05 VP  V 0.35 V  bought 0.02 VP  V NP 0.45 VP  VP PP 0.20 Det  a 0.04 NP  N 0.8 NP  Det N 0.2 The sentence is “Mary bought a book”.

The most likely tree P(T, S) = P(T) * P(S|T) = P(T) T is a parse tree, S is a sentence The best parse tree for a sentence S

Find the most likely tree Given a PCFG and a sentence, how to find the best parse tree for S? One algorithm: CYK

CYK algorithm for CFG For every rule A  w_i, For span=2 to N for begin=1 to N-span+1 end = begin + span – 1; for m=begin to end-1; for all non-terminals A, B, C: If then

CYK algorithm for CFG (another implementation) For every rule A  w_i, For span=2 to N for begin=1 to N-span+1 end = begin + span – 1; for m=begin to end-1; for all non-terminals A, B, C: if then

Variables for CFG and PCFG CFG: whether there is a parse tree whose root is A and which covers PCFG: the prob of the most likely parse tree whose root is A and which covers

CYK algorithm for PCFG For every rule A  w_i, For span=2 to N for begin=1 to N-span+1 end = begin + span – 1; for m=begin to end-1; for all non-terminals A, B, C: if then

A CFG Rules: VP  V NP V  book VP  VP PP N  book/flight/cards NP  Det N Det  that/the NP  NP PP P  with PP  P NP

Parse “book that flight” VP  V NP (m=1) NP  Det N (m=2) N  flight ----Det  that N  book V  book begin=1 begin=2 begin=3 end=1 end=2 end=3

A PCFG Rules: VP  V NP 0.4 V  book 0.001 VP  VP PP 0.2 N  book 0.01 NP  Det N 0.3 Det  that 0.1 NP  NP PP 0.2 P  with 0.2 PP  P NP 1.0 N  flight 0.02

Parse “book that flight” VP  V NP (m=1) 2.4e-7 NP  Det N (m=2) 6e-4 N  flight 0.02 ----Det  that 0.1 N  book 0.01 V  book 0.001 begin=1 begin=2 begin=3 end=1 end=2 end=3

N-best parse trees Best parse tree: N-best parse trees:

CYK algorithm for N-best For every rule A  w_i, For span=2 to N for begin=1 to N-span+1 end = begin + span – 1; for m=begin to end-1; for all non-terminals A, B, C: for each if val > one of probs in then remove the last element in and insert val to the array remove the last element in B[begin][end][A] and insert (m, B,C,i, j) to B[begin][end][A].

PCFG for Language Modeling (LM) N-gram LM: Syntax-based LM:

Calculating Pr(S) Parsing: the prob of the most likely parse tree LM: the sum of all parse trees

CYK for finding the most likely parse tree For every rule A  w_i, For span=2 to N for begin=1 to N-span+1 end = begin + span – 1; for m=begin to end-1; for all non-terminals A, B, C: if then

CYK for calculating LM For every rule A  w_i, For span=2 to N for begin=1 to N-span+1 end = begin + span – 1; for m=begin to end-1; for all non-terminals A, B, C:

CYK algorithm One parse treebooleantuple All parse treesbooleanlist of tuples Most likely parse tree real number (the max prob) tuple N-best parse trees list of real numberslist of tuples LM for sentence real number (the sum of probs) not needed

Learning PCFG Probabilities Given a treebank (i.e., a set of trees), use MLE: Without treebanks  inside-outside algorithm

Q&A PCFG CYK algorithm

Problems of PCFG Lack of sensitivity to structural dependency: Lack of sensitivity to lexical dependency:

Structural Dependency Each PCFG rule is assumed to be independent of other rules. Observation: sometimes the choice of how a node expands is dependent on the location of the node in the parse tree. –NP  Pron depends on whether the NP was a subject or an object

Lexical Dependency Given P(NP  NP PP) > P(VP  VP PP) should a PP always be attached to an NP? Verbs such as “send” Preps such as “of”, “into”

Solution to the problems Structural dependency Lexical dependency  Other more sophisticated models.

Lexicalized PCFG

Head and head child Each syntactic constituent is associated with a lexical head. Each context-free rule has a head child: –VP  V NP –NP  Det N –VP  VP PP –NP  NP PP –VP  to VP –VP  aux VP

Head propagation Lexical head propagates from head child to its parent. An example: “Mary bought a book in the store.”

Lexicalized PCFG Lexicalized rules: –VP (bought)  V(bought) NP 0.01 –VP  V NP | 0.01 | 0 | bought - –VP (bought)  V (bought) NP (book) 1.5e-7 –VP  V NP | 1.5e-7 | 0 | bought book

Finding head in a parse tree Head propagation table: simple rules to find head child An example: –(VP left V/VP/Aux) –(PP left P) –(NP right N)

Simplified Model using Lexicalized PCFG PCFG: P(r(n)|n) Lexicalized PCFG: P(r(n)|n, head(n)) – P(VP  VBD NP PP | VP, dumped) – P(VP  VBD NP PP | VP, slept) Parsers that use lexicalized rules –Collins’ parser

Parsing with PCFG Ling 571 Fei Xia Week 3: 10/11-10/13/05.

Similar presentations

Presentation on theme: "Parsing with PCFG Ling 571 Fei Xia Week 3: 10/11-10/13/05."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Parsing with PCFG Ling 571 Fei Xia Week 3: 10/11-10/13/05.

Similar presentations

Presentation on theme: "Parsing with PCFG Ling 571 Fei Xia Week 3: 10/11-10/13/05."— Presentation transcript:

Similar presentations

About project

Feedback