Presentation is loading. Please wait.

Presentation is loading. Please wait.

Parsing Unrestricted Text

Similar presentations


Presentation on theme: "Parsing Unrestricted Text"— Presentation transcript:

1 Parsing Unrestricted Text
Joakim Nivre

2 Two Notions of Parsing Grammar parsing: Text parsing:
Given a grammar G and an input string x  *, derive some or all of the analyses y assigned to x by G. Text parsing: Given a text T = (x1, …, xn), derive the correct analysis yi for every sentence xi  T.

3 Grammar Parsing Properties of grammar parsing:
Abstract problem: Mapping from (G, x) to y. Parsing implies recognition; analyses defined only if x  L(G). Correctness (consistency and completeness) can be proven without considering any input string x.

4 Text Parsing Properties of text parsing:
Not a well-defined abstract problem (the text language is not a formal language). Parsing does not imply recognition (recognition presupposes a formal language). Empirical approximation problem. Correctness can only be established with reference to empirical samples of the text language (statistical inference).

5 Two Methods for Text Parsing
Grammar-driven text parsing: Text parsing approximated by grammar parsing. Data-driven text parsing: Text parsing approximated by statistical inference. Not mutually exclusive methods: Grammars can be combined with statistical inference (e.g. PCFG).

6 Grammar-Driven Text Parsing
Basic assumption: The text language L can be approximated by L(G). Potential problems (evaluation criteria): Robustness Disambiguation Accuracy Efficiency

7 Robustness Basic issue: Two cases: Techniques:
What happens if x  L(G)? Two cases: x  L(G), x  L (coverage) x  L(G), x  L (robustness) Techniques: Constraint relaxation Partial parsing

8 Disambiguation Basic issue: Two cases: Techniques:
What happens when G assigns more than one analysis y to a sentence x? Two cases: String ambiguity (real) (disambiguation) Grammar ambiguity (spurious) (leakage) Techniques: Grammar specialization Deterministic parsing Eliminative parsing Data-driven parsing (e.g. PCFG)

9 Accuracy Basic issue: Grammar-driven techniques:
How often can the parser deliver a single correct analysis? Grammar-driven techniques: Linguistically adequate analyses? Adequacy undermined by techniques to handle robustness and disambiguation.

10 Efficiency Theoretical complexity:
Many linguistically motivated formalisms have intractable parsing problems. Even polynomially parsable formalims often have high complexity. Practical efficiency is also affected by: Grammar constants Techniques for handling robustness and disambiguation

11 Data-Driven Text Parsing
Basic assumption: The text language L can be approximated by statistical inference from text samples. Components: A formal model M defining permissible representations for sentences in L A sample of text Tt = (x1, …, xn) from L, with or without the correct analyses At = (y1, …, yn) An inductive inference scheme I defining actual analyses for the sentences of any text T = (x1,…,xn) in L, relative to M and Tt (and possibly At)

12 Robustness Basic issue: Radical constraint relaxation: Example (DOP3):
Is M a grammar or not (cf. PCFG)? Radical constraint relaxation: Ensure that every string has at least one analysis. Example (DOP3): M permits any parse tree composed from subtrees in Tt, with free insertion of (even unseen) words from x. Tt is annotated with context-free parse trees. I defines the probability P(x, y) to be the sum of the probabilities of each derivation of y for x (for any x, y).

13 Disambiguation Basic issue: Structure of I: Example: PCFG
How rank different analyses yi of x? Structure of I: A parameterized stochastic model M, assigning a score S(x, yi) to each permissible analysis yi of x, relative to a set of parameters . A parsing method, i.e. a method for computing the best yi according to S(x, yi) (given ). A learning method, i.e. a method for instantiating  based on inductive inference from Tt. Example: PCFG

14 Accuracy Basic issue: Data-driven techniques:
How often can the parser deliver a single correct analysis? Data-driven techniques: Empirically adequate ranking of alternatives? Accuracy undermined by combinatorial explosion due to radical constraint relaxation.

15 Efficiency Theoretical complexity:
Many data-driven models have intractable inference problems. Even polynomially parsable models often have high complexity. Practical efficiency is also affected by: Model constants Techniques for handling robustness and disambiguation

16 Converging Approaches?
Text parsing: Complex optimization problem Two optimization strategies: Start with good accuracy, improve robustness and disambiguation (while controlling efficiency). Start with good disambiguation (and robustness), improve accuracy (while controlling efficiency). Strategies converging on the same solution? Constraint relaxation for robustness Data-driven models for disambiguation Heuristic search techniques for efficiency


Download ppt "Parsing Unrestricted Text"

Similar presentations


Ads by Google