Presentation is loading. Please wait.

Presentation is loading. Please wait.

 Christel Kemke 1 2007/08 COMP 4060 Natural Language Processing PARSING.

Similar presentations


Presentation on theme: " Christel Kemke 1 2007/08 COMP 4060 Natural Language Processing PARSING."— Presentation transcript:

1  Christel Kemke 1 2007/08 COMP 4060 Natural Language Processing PARSING

2 2007/08  Christel Kemke 2 Parsing  Language, Syntax, Parsing  Problems in Parsing  Ambiguity  Attachment / Binding  Bottom vs. Top Down Parsing  Chart-Parsing  Earley-Algorithm

3 2007/08  Christel Kemke 3 Natural Language - Parsing Parsing derive the syntactic structure of a sentence based on a language model (grammar) construct a parse tree, i.e. the derivation of the sentence based on the grammar (rewrite system)

4 2007/08  Christel Kemke 4 Natural Language - Grammar Natural Language Syntax described through a formal language, often a context-free grammar (CFG): G=(NT,T,P,S): the Start-Symbol S  NT ≡ sentence symbol Non-Terminals NT ≡ syntactic constituents Terminals T ≡ lexical entries/ words Production Rules P  NT  (NT  T) + ≡ grammar rules

5 2007/08  Christel Kemke 5 Sample Grammar Grammar (S, NT, T, P) Sentence Symbol S  NT, Part-of-Speech  NT, Constituents  NT, Terminals, Word  T Grammar Rules P  NT  (NT  T)* S  NP VPstatement S  Aux NP VPquestion S  VPcommand NP  Det Nominal NP  Proper-Noun Nominal  Noun | Noun Nominal | Nominal PP VP  Verb | Verb NP | Verb PP | Verb NP PP PP  Prep NP Det  that | this | a Noun  book | flight | meal | money Proper-Noun  Houston | American Airlines | TWA Verb  book | include | prefer Aux  does Prep  from | to | on

6 2007/08  Christel Kemke 6 Parsing Task Parse "Does this flight include a meal?"

7 2007/08  Christel Kemke 7 Parse "Does this flight include a meal?" S Aux NP VP Det Nominal Verb NP Noun Det Nominal does this flight include a meal Sample Parse Tree

8 2007/08  Christel Kemke 8 Problems in Parsing - Ambiguity Ambiguity syntactical/structural ambiguity – several parse trees are possible e.g. above sentence semantic/lexical ambiguity – several word meanings e.g. bank (where you get money) and (river) bank even different word categories possible (interim) e.g. “ He books the flight. ” vs. “ The books are here. “ or “ Fruit flies from the balcony ” vs. “ Fruit flies are on the balcony. ” “Peter saw Mary with the telescope / her friend / his friend.”

9 2007/08  Christel Kemke 9 Problems in Parsing – Attachment 1 Attachment in particular PP (prepositional phrase) binding; often referred to as binding problem. See next slides.

10 2007/08  Christel Kemke 10 Problems in Parsing – Attachment 2 “One morning, I shot an elephant in my pajamas.” Binding 2: VP  Verb NP and NP  Det Nominal and Nominal  Nominal PP and Nominal  Noun (S... (NP (PNoun I )) (VP (Verb shot ) (NP (Det an) (Nominal (Nominal (Noun elephant ) (PP in my pajamas )... ) Binding 1: VP  Verb NP PP (S... (NP (PNoun I )(VP (Verb shot ) (NP (Det an (Nominal (Noun elephant ))) (PP in my pajamas ))...)

11 2007/08  Christel Kemke 11 Problems in Parsing – Attachment 3 “One morning, I shot an elephant in my pajamas.” Binding 2: VP  Verb NP and NP  Det Nominal and Nominal  Nominal PP and Nominal  Noun (S... (NP (PNoun I )) (VP (Verb shot ) (NP (Det an) (Nominal (Nominal (Noun elephant ) (PP in my pajamas )... ) “How he got into them, I don’t know.”

12 2007/08  Christel Kemke 12 Bottom-up – from word-nodes to sentence-symbol Top-down Parsing – from sentence-symbol to words S Aux NP VP Det Nominal Verb NP NounDet Nominal doesthis flight include a meal Bottom-up and Top-down Parsing

13 2007/08  Christel Kemke 13 Problems with Bottom-up and Top-down Parsing Problems with left-recursive rules like NP  NP PP: don ’ t know how many times recursion is needed Pure Bottom-up or Top-down Parsing is inefficient because it generates and explores too many structures which in the end turn out to be invalid (several grammar rules applicable  ‘interim’ ambiguity). Combine top-down and bottom-up approach: Start with sentence; use rules top-down (look-ahead); read input; try to find shortest path from input to highest unparsed constituent (from left to right).  Chart-Parsing / Earley-Parser

14 2007/08  Christel Kemke 14 Chart Parsing / Early Algorithm Earley-Parser based on Chart-Parsing Essence: Integrate top-down and bottom-up parsing. Keep recognized sub-structures (sub-trees) for shared use during parsing. Top-down: Start with S-symbol. Generate all applicable rules for S. Go further down with left-most constituent in rules and add rules for these constituents until you encounter a left-most node on the RHS which is a word category (POS). Bottom-up: Read input word and compare. If word matches, mark as recognized and move parsing on to the next category in the rule(s).

15 2007/08  Christel Kemke 15 Chart A Chart is a graph with n+1 nodes marked 0 to n for a sequence of n input words. Arcs indicate recognized part of RHS of rule. The indicates recognized constituents in rules. Jurafsky & Martin, Figure 10.15, p. 380

16 2007/08  Christel Kemke 16 Chart Parsing / Earley Parser 1 Chart Sequence of n input words; n+1 nodes marked 0 to n. States in chart represent possible rules and recognized constituents. RHS of recognized rule is covered by arc. Interim state S  VP, [0,0]  top-down look at rule S  VP  nothing of RHS of rule yet recognized ( is far left)  arc at beginning, no coverage (covers no input word; beginning of arc at node 0 and end of arc at node 0)

17 2007/08  Christel Kemke 17 Chart Parsing / Earley Parser 2 Interim states NP  Det Nominal, [1,2]  top-down look with rule NP  Det Nominal  Det recognized ( after Det)  arc covers one input word which is between node 1 and node 2  look next for Nominal, top-down NP  Det Nominal, [1,3]  Nominal was recognized, move after Nominal  move end of arc to cover Nominal; change 2 to 3  structure is completely recognized; arc is inactive;  mark NP as recognized in other rules (move ), bottom up

18 2007/08  Christel Kemke 18 Chart - 0 Book this flight S . VP VP . V NP

19 2007/08  Christel Kemke 19 Chart - 1 VP  V. NP V Book this flight S . VP NP . Det Nom

20 2007/08  Christel Kemke 20 Chart - 2 VP  V. NP V Book this flight S . VP NP  Det. Nom Det Nom . Noun

21 2007/08  Christel Kemke 21 Chart - 3a VP  V. NP V Book this flight S . VP NP  Det. Nom Det Nom  Noun. Noun

22 2007/08  Christel Kemke 22 Chart - 3b VP  V. NP V Book this flight S . VP NP  Det Nom. Det Nom  Noun. Noun

23 2007/08  Christel Kemke 23 Chart - 3c VP  V NP. V Book this flight NP  Det Nom. Det Nom  Noun. Noun S . VP

24 2007/08  Christel Kemke 24 Chart - 3d VP  V NP. V Book this flight S  VP. NP  Det Nom. Det Nom  Noun. Noun

25 2007/08  Christel Kemke 25 Chart – Valid and Invalid Rules/Arcs NP  Det Nom. VP  V. NP Nom  Noun. VDetNoun Book this flight S . VP VP . V NP NP . Det Nom NP  Det. Nom VP  V NP. S  VP. Nom . Noun

26 2007/08  Christel Kemke 26 Chart - Final States NP  Det Nom. Nom  Noun. V Det Noun Book this flight VP  V NP. S  VP.

27 2007/08  Christel Kemke 27 Chart 0 with two S- and two VP-Rules Book this flight S . VP VP . V NP additional S-rule S . VP NP additional VP-rule VP . V

28 2007/08  Christel Kemke 28 Chart 1a with two S- and two VP-Rules VP  V. NP V Book this flight S . VP NP . Det Nom S . VP NP VP  V.

29 2007/08  Christel Kemke 29 Chart 1b with two S- and two VP-Rules VP  V. NP V Book this flight S  VP. NP . Det Nom S  VP. NP VP  V.

30 2007/08  Christel Kemke 30 Chart 2 with two S- and two VP-Rules VP  V. NP V Book this flight S  VP. NP  Det. Nom S  VP. NP VP  V. Nom . Noun

31 2007/08  Christel Kemke 31 VP  V NP. V Book this flight S  VP. NP  Det Nom. Det Nom  Noun. S  VP NP. VP  V. Chart 3 with two S- and two VP-Rules Noun

32 2007/08  Christel Kemke 32 NP  Det Nom. Final Chart - with two S-and two VP-Rules VP  V NP. V Book this flight S  VP NP. Det Nom  Noun. Noun S  VP. VP  V.

33  Christel Kemke 33 2007/08 Earley Parser

34 2007/08  Christel Kemke 34 Earley Algorithm - Functions predictor generates new rules for partly recognized RHS with constituent right of (top-down generation) scanner if word category (POS) is found right of the, the Scanner reads the next input word and adds a rule for it to the chart (bottom- up mode) completer if rule is completely recognized (the is far right), the recognition state of earlier rules in the chart advances: the is moved over the recognized constituent (bottom-up recognition).

35 2007/08  Christel Kemke 35 Earley – Chart for “book that flight” including references to completed states/rules

36 2007/08  Christel Kemke 36 Earley – Chart for “book that flight” from 2 nd edition

37 2007/08  Christel Kemke 37 function EARLEY-PARSE(words, grammar) returns chart ENQUEUE((    S, [0,0]), chart[0]) for i_from 0 to LENGTH(words) do for each state in chart[i] do if INCOMPLETE?(state) and NEXT-CAT(state) is not a part of speech then PREDICTOR(state) elseif INCOMPLETE?(state) and NEXT-CAT(state)is a part of speech then SCANNER(state) else COMPLETER(state) end return(chart) - continued - Earley-Algorithm

38 2007/08  Christel Kemke 38 procedure PREDICTOR((A    B , [i,j])) for each (B   ) in GRAMMAR-RULES-FOR(B, grammar) do ENQUEUE((B    [j,j], chart[j]) end procedure SCANNER ((A    B , [i,j])) if B  PARTS-OF-SPEECH(word[j]) then ENQUEUE((B  word[j], [j,j+1]), chart[j+1]) end procedure COMPLETER ((B   , [j,k])) for each (A    B , [i,j]) in chart[j] do ENQUEUE((A   B  , [i,k]), chart[k]) end procedure ENQUEUE(state, chart-entry) if state is not already in chart-entry then PUSH(state, chart-entry) end Earley-Algorithm (continued)

39 2007/08  Christel Kemke 39 Earley-Algorithm (copy from 2 nd edition) Earley – Algorithm main

40 2007/08  Christel Kemke 40 Earley-Algorithm (continued) Earley – Algorithm processes

41 2007/08  Christel Kemke 41 Earley – Algorithm complete

42 2007/08  Christel Kemke 42 Chart-Parser Algorithm (just FYI)

43  Christel Kemke 43 2007/08 Earley Algorithm - Figures Jurafsky & Martin, 2 nd ed., Ch. 13 Figures 13.16, 13.13, 13.14

44 2007/08  Christel Kemke 44 Additional References Jurafsky, D. & J. H. Martin, Speech and Language Processing, Prentice-Hall, 2000. (Chapters 9 and 10) Earley Algorithm Jurafsky & Martin, Figure 10.16, p.384 Earley Algorithm - Examples Jurafsky & Martin, Figures 10.17 and 10.18


Download ppt " Christel Kemke 1 2007/08 COMP 4060 Natural Language Processing PARSING."

Similar presentations


Ads by Google