Presentation is loading. Please wait.

Presentation is loading. Please wait.

Parsing SLP Chapter 13. 7/2/2015 Speech and Language Processing - Jurafsky and Martin 2 Outline  Parsing with CFGs  Bottom-up, top-down  CKY parsing.

Similar presentations


Presentation on theme: "Parsing SLP Chapter 13. 7/2/2015 Speech and Language Processing - Jurafsky and Martin 2 Outline  Parsing with CFGs  Bottom-up, top-down  CKY parsing."— Presentation transcript:

1 Parsing SLP Chapter 13

2 7/2/2015 Speech and Language Processing - Jurafsky and Martin 2 Outline  Parsing with CFGs  Bottom-up, top-down  CKY parsing  Mention of Earley and chart parsing

3 7/2/2015 Speech and Language Processing - Jurafsky and Martin 3 Parsing  Parsing with CFGs refers to the task of assigning trees to input strings  Trees that covers all and only the elements of the input and has an S at the top  This chapter: find all possible trees  Next chapter (14): choose the most probable one

4 7/2/2015 Speech and Language Processing - Jurafsky and Martin 4 Parsing  parsing involves a search

5 7/2/2015 Speech and Language Processing - Jurafsky and Martin 5 Top-Down Search  We’re trying to find trees rooted with an S; start with the rules that give us an S.  Then we can work our way down from there to the words.

6 7/2/2015 Speech and Language Processing - Jurafsky and Martin 6 Top Down Space

7 7/2/2015 Speech and Language Processing - Jurafsky and Martin 7 Bottom-Up Parsing  We also want trees that cover the input words.  Start with trees that link up with the words  Then work your way up from there to larger and larger trees.

8 Bottom-Up Space 8

9 7/2/2015 Speech and Language Processing - Jurafsky and Martin 9 Top-Down and Bottom-Up  Top-down  Only searches for trees that can be S’s  But also suggests trees that are not consistent with any of the words  Bottom-up  Only forms trees consistent with the words  But suggests trees that make no sense globally

10 7/2/2015 Speech and Language Processing - Jurafsky and Martin 10 Control  Which node to try to expand next  Which grammar rule to use to expand a node  One approach: exhaustive search of the space of possibilities  Not feasible  Time is exponential in the number of non- terminals  LOTS of repeated work, as the same constituent is created over and over (shared sub-problems)

11 7/2/2015 Speech and Language Processing - Jurafsky and Martin 11 Dynamic Programming  DP search methods fill tables with partial results and thereby  Avoid doing avoidable repeated work  Solve exponential problems in polynomial time (well, no not really – we’ll return to this point)  Efficiently store ambiguous structures with shared sub-parts.  We’ll cover two approaches that roughly correspond to bottom-up and top-down approaches.  CKY  Earley – we will mention this, not cover it in detail

12 7/2/2015 Speech and Language Processing - Jurafsky and Martin 12 CKY Parsing  Consider the rule A  BC  If there is an A somewhere in the input then there must be a B followed by a C in the input.  If the A spans from i to j in the input then there must be some k st. i<k<j  Ie. The B splits from the C someplace.

13 7/2/2015 Speech and Language Processing - Jurafsky and Martin 13 Convert Grammar to CNF  What if your grammar isn’t binary?  As in the case of the TreeBank grammar?  Convert it to binary… any arbitrary CFG can be rewritten into Chomsky-Normal Form automatically.  The resulting grammar accepts (and rejects) the same set of strings as the original grammar.  But the resulting derivations (trees) are different.  We saw this in the last set of lecture notes

14 7/2/2015 Speech and Language Processing - Jurafsky and Martin 14 Convert Grammar to CNF  More specifically, we want our rules to be of the form A  B C Or A  w That is, rules can expand to either 2 non- terminals or to a single terminal.

15 7/2/2015 Speech and Language Processing - Jurafsky and Martin 15 Binarization Intuition  Introduce new intermediate non-terminals into the grammar that distribute rules with length > 2 over several rules.  So… S  A B C turns into S  X C and X  A B Where X is a symbol that doesn’t occur anywhere else in the the grammar.

16 Converting grammar to CNF 1.Copy all conforming rules to the new grammar unchanged 2.Convert terminals within rules to dummy non-terminals 3.Convert unit productions 4.Make all rules with NTs on the right binary In lecture: what these mean; apply to example on next two slides 7/2/2015 16

17 7/2/2015 Speech and Language Processing - Jurafsky and Martin 17 Sample L1 Grammar

18 7/2/2015 Speech and Language Processing - Jurafsky and Martin 18 CNF Conversion

19 7/2/2015 Speech and Language Processing - Jurafsky and Martin 19 CKY  Build a table so that an A spanning from i to j in the input is placed in cell [i,j] in the table.  E.g., a non-terminal spanning an entire string will sit in cell [0, n]  Hopefully an S

20 7/2/2015 Speech and Language Processing - Jurafsky and Martin 20 CKY  If  there is an A spanning i,j in the input  A  B C is a rule in the grammar  Then  There must be a B in [i,k] and a C in [k,j] for some i<k<j

21 7/2/2015 Speech and Language Processing - Jurafsky and Martin 21 CKY  The loops to fill the table a column at a time, from left to right, bottom to top.  When we’re filling a cell, the parts needed to fill it are already in the table  to the left and below

22 7/2/2015 Speech and Language Processing - Jurafsky and Martin 22 CKY Algorithm

23 7/2/2015 Speech and Language Processing - Jurafsky and Martin 23 Example Go through full example in lecture

24 7/2/2015 Speech and Language Processing - Jurafsky and Martin 24 CKY Parsing  Is that really a parser?  So, far it is only a recognizer  Success? an S in cell [0,N]  To turn it into a parser … see Lecture

25 7/2/2015 Speech and Language Processing - Jurafsky and Martin 25 CKY Notes  Since it’s bottom up, CKY populates the table with a lot of worthless constituents.  To avoid this we can switch to a top-down control strategy  Or we can add some kind of filtering that blocks constituents where they can not happen in a final analysis.

26 Dynamic Programming Parsing Methods  CKY (Cocke-Kasami-Younger) algorithm based on bottom-up parsing and requires first normalizing the grammar.  Earley parser is based on top-down parsing and does not require normalizing grammar but is more complex.  More generally, chart parsers retain completed phrases in a chart and can combine top-down and bottom-up search. 26

27 Conclusions  Syntax parse trees specify the syntactic structure of a sentence that helps determine its meaning.  John ate the spaghetti with meatballs with chopsticks.  How did John eat the spaghetti? What did John eat?  CFGs can be used to define the grammar of a natural language.  Dynamic programming algorithms allow computing a single parse tree in cubic time or all parse trees in exponential time. 27


Download ppt "Parsing SLP Chapter 13. 7/2/2015 Speech and Language Processing - Jurafsky and Martin 2 Outline  Parsing with CFGs  Bottom-up, top-down  CKY parsing."

Similar presentations


Ads by Google