Presentation is loading. Please wait.

Presentation is loading. Please wait.

LINGUISTICA GENERALE E COMPUTAZIONALE ANALISI SINTATTICA (PARSING)

Similar presentations


Presentation on theme: "LINGUISTICA GENERALE E COMPUTAZIONALE ANALISI SINTATTICA (PARSING)"— Presentation transcript:

1 LINGUISTICA GENERALE E COMPUTAZIONALE ANALISI SINTATTICA (PARSING)

2 RECAP: CFG Bird et al, ch. 8.3

3 PARSING Parsing is the process of recognizing and assigning STRUCTURE Parsing a string with a CFG: – Finding a derivation of the string consistent with the grammar – The derivation gives us a PARSE TREE

4 EXAMPLE (CFR LAST WEEK)

5 PARSING AS SEARCH Just as in the case of non-deterministic regular expressions, the main problem with parsing is the existence of CHOICE POINTS There is a need for a SEARCH STRATEGY determining the order in which alternatives are considered

6 TOP-DOWN AND BOTTOM-UP SEARCH STRATEGIES The search has to be guided by the INPUT and the GRAMMAR TOP-DOWN search: the parse tree has to be rooted in the start symbol S – EXPECTATION-DRIVEN parsing BOTTOM-UP search: the parse tree must be an analysis of the input – DATA-DRIVEN parsing

7 AN EXAMPLE OF TOP-DOWN SEARCH (IN PARALLEL)

8 RECURSIVE DESCENT IN NLTK P. 303: – nltk.RecursiveDescentParser(grammar) – nltk.app.rdparser()

9 NON-PARALLEL SEARCH If it’s not possible to examine all alternatives in parallel, it’s necessary to make further decisions: – Which node in the current search space to expand first (breadth-first or depth-first) – Which of the applicable grammar rules to expand first – Which leaf node in a parse tree to expand next (e.g., leftmost)

10 TOP-DOWN, DEPTH-FIRST, LEFT-TO-RIGHT

11 TOP-DOWN, DEPTH-FIRST, LEFT-TO-RIGHT (II)

12 TOP-DOWN, DEPTH-FIRST, LEFT-TO-RIGHT (III)

13 TOP-DOWN, DEPTH-FIRST, LEFT-TO-RIGHT (IV)

14 A T-D, D-F, L-R PARSER

15 LEFT-RECURSION A LEFT-RECURSIVE grammar may cause a T-D, D-F, L-R parser to never return Examples of left-recursive rules: – NP  NP PP – S  S and S – But also: NP  Det Nom Det  NP’s

16 THE PROBLEM WITH LEFT-RECURSION

17 LEFT-RECURSION: POOR SOLUTIONS Rewrite the grammar to a weakly equivalent one – Problem: may not get correct parse tree Limit the depth during search – Problem: limit is arbitrary

18 AN EXAMPLE OF BOTTOM-UP SEARCH

19 SHIFT-REDUCE PARSING P. 305 – nltk.app.srparser() – ShiftReduceParser(grammar)

20 TOP-DOWN vs BOTTOM-UP TOP-DOWN: – Only search among grammatical answers – BUT: suggests hypotheses that may not be consistent with data – Problem: left-recursion BOTTOM-UP: – Only forms hypotheses consistent with data – BUT: may suggest hypotheses that make no sense globally

21 LEFT-CORNER PARSING A hybrid of top-down and bottom-up parsing Strategy: don’t consider any expansion unless the current input can serve as the LEFT- CORNER of that expansion

22 LC PARSING IN PYTHON

23 FURTHER PROBLEMS IN PARSING Ambiguity – Church and Patel (1982): the number of attachment ambiguities grows like the Catalan numbers C(2) = 2, C(3) = 5, C(4) = 14, C(5) = 132, C(6) = 469, C(7) = 1430, C(8) = 4867 Avoiding reparsing

24 COMMON STRUCTURAL AMBIGUITIES COORDINATION ambiguity – OLD (MEN AND WOMEN) vs (OLD MEN) AND WOMEN ATTACHMENT ambiguity: – Gerundive VP attachment ambiguity I saw the Eiffel Tower flying to Paris – PP attachment ambiguity I shot an elephant in my pajamas

25 PP ATTACHMENT AMBIGUITY

26 AMBIGUITY: SOLUTIONS Use a PROBABILISTIC GRAMMAR (not covered in this module) Use semantics

27 AVOID RECOMPUTING INVARIANTS Consider parsing with a top-down parser the NP: – A flight from Indianapolis to Houston on TWA With the grammar rules: – NP  Det Nominal – NP  NP PP – NP  ProperNoun

28 INVARIANTS AND TOP-DOWN PARSING

29 THE EARLEY ALGORITHM

30 DYNAMIC PROGRAMMING A standard T-D parser would reanalyze A FLIGHT 4 times, always in the same way A DYNAMIC PROGRAMMING algorithm uses a table (the CHART) to avoid repeating work The Earley algorithm also – Does not suffer from the left-recursion problem – Solves an exponential problem in O(n 3 )

31 THE CHART The Earley algorithm uses a table (the CHART) of size N+1, where N is the length of the input – Table entries sit in the `gaps’ between words Each entry in the chart is a list of – Completed constituents – In-progress constituents – Predicted constituents All three types of objects are represented in the same way as STATES

32 THE CHART: GRAPHICAL REPRESENTATION

33 STATES A state encodes two types of information: – How much of a certain rule has been encountered in the input – Which positions are covered – A  , [X,Y] DOTTED RULES – VP  V NP  – NP  Det  Nominal – S   VP

34 EXAMPLES

35 SUCCESS The parser has succeeded if entry N+1 of the chart contains the state – S   , [0,N]

36 THE ALGORITHM The algorithm loops through the input without backtracking, at each step performing three operations: – PREDICTOR: add predictions to the chart – COMPLETER: Move the dot to the right when looked-for constituent is found – SCANNER: read in the next input word

37 THE ALGORITHM: CENTRAL LOOP

38 EARLEY ALGORITHM: THE THREE OPERATORS

39 EXAMPLE, AGAIN

40 EXAMPLE: BOOK THAT FLIGHT

41 EXAMPLE: BOOK THAT FLIGHT (II)

42 EXAMPLE: BOOK THAT FLIGHT (III)

43 EXAMPLE: BOOK THAT FLIGHT (IV)

44 CHART PARSING IN NLTK 8.4, p. 307 ff

45 DEPENDENCY PARSING 8.5

46 READINGS Bird et al, chapter 8


Download ppt "LINGUISTICA GENERALE E COMPUTAZIONALE ANALISI SINTATTICA (PARSING)"

Similar presentations


Ads by Google