Download presentation

Presentation is loading. Please wait.

Published byMichael Cochran Modified over 2 years ago

1
Feature Forest Models for Syntactic Parsing Yusuke Miyao University of Tokyo

2
Probabilistic models for NLP Widely used for disambiguation of linguistic structures Ex.) POS tagging A pretty girl is crying NN DT VBZ JJ VBG NN DT VBZ JJ VBG NN DT VBZ JJ VBG NN DT VBZ JJ VBG NN DT VBZ JJ VBG P(NN|a/NN, pretty)

3
Probabilistic models for NLP Widely used for disambiguation of linguistic structures Ex.) POS tagging A pretty girl is crying NN DT VBZ JJ VBG NN DT VBZ JJ VBG NN DT VBZ JJ VBG NN DT VBZ JJ VBG NN DT VBZ JJ VBG

4
Probabilistic models for NLP Widely used for disambiguation of linguistic structures Ex.) POS tagging A pretty girl is crying NN DT VBZ JJ VBG NN DT VBZ JJ VBG NN DT VBZ JJ VBG NN DT VBZ JJ VBG NN DT VBZ JJ VBG

5
Implicit assumption Processing state = Primitive probability –Efficient algorithm for searching –Avoid exponential explosion of ambiguities NN DT VBZ JJ VBG NN DT VBZ JJ VBG NN DT VBZ JJ VBG NN DT VBZ JJ VBG NN DT VBZ JJ VBG A pretty girl is crying POS tag = processing state = primitive probability

6
The assumption is right? Ex.) Shallow parsing, NE recognition

7
The assumption is right? Ex.) Shallow parsing, NE recognition NP-B VP-I NP-I O VP-B A pretty girl is crying NP-B VP-I NP-I O VP-B NP-B VP-I NP-I O VP-B NP-B VP-I NP-I O VP-B NP-B VP-I NP-I O VP-B

8
The assumption is right? Ex.) Shallow parsing, NE recognition –B(Begin), I(Internal), O(Other) tags are introduced to represent multi-word tags NP-B VP-I NP-I O VP-B A pretty girl is crying NP-B VP-I NP-I O VP-B NP-B VP-I NP-I O VP-B NP-B VP-I NP-I O VP-B NP-B VP-I NP-I O VP-B

9
The assumption is right? Ex.) Syntactic parsing

10
The assumption is right? Ex.) Syntactic parsing What do you want to give? VP S S S P(VP|VPto give)

11
The assumption is right? Ex.) Syntactic parsing –Non-local dependencies are not represented What do you want to give? VP S S S P(VP|VPto give)

12
Problem of existing models Processing state Primitive probability

13
Problem of existing models Processing state Primitive probability How to model the probability of ambiguous structures with more flexibility?

14
Possible solution A complete structure is a primitive event –Ex.) Shallow parsing NP-B VP-I NP-I O VP-B A pretty girl is crying NP-B VP-I NP-I O VP-B NP-B VP-I NP-I O VP-B NP-B VP-I NP-I O VP-B NP-B VP-I NP-I O VP-B

15
Possible solution A complete structure is a primitive event –Ex.) Shallow parsing NPVP NPVP A pretty girl is crying NPVP NPVPNP VPNP All possible sequences

16
Possible solution A complete structure is a primitive event –Ex.) Shallow parsing Probability of the sequence of multi-word tags NPVP NPVP A pretty girl is crying NPVP NPVPNP VPNP All possible sequences

17
Possible solution A complete structure is a primitive event –Ex.) Shallow parsing Probability of the sequence of multi-word tags NPVP NPVP A pretty girl is crying NPVP NPVPNP VPNP All possible sequences

18
Possible solution A complete structure is a primitive event –Ex.) Syntactic parsing What do you want to give? VP S S S

19
Possible solution A complete structure is a primitive event –Ex.) Syntactic parsing what do you want to give ARG1 ARG2 MODIFY ARG2

20
Possible solution A complete structure is a primitive event –Ex.) Syntactic parsing Probability of argument structures what do you want to give ARG1 ARG2 MODIFY ARG2

21
Problem Complete structures have exponentially many ambiguities NPVP NPVP A pretty girl is crying NPVP NPVPNP VPNP Exponentially many sequences

22
Proposal Feature forest model [Miyao and Tsujii, 2002]

23
Proposal Feature forest model [Miyao and Tsujii, 2002] Conjunctive node Disjunctive node Features Exponentially many trees are packed Features are assigned to each conjunctive node

24
Feature forest model Feature forest models can be efficiently estimated without exponential explosion [Miyao and Tsujii, 2002]

25
Feature forest model Feature forest models can be efficiently estimated without exponential explosion [Miyao and Tsujii, 2002] When unpacking the forest, the model is equivalent to maximum entropy models [Berger et al., 1996]

26
Application to parsing Applying a feature forest model to disambiguation of argument structures

27
Application to parsing Applying a feature forest model to disambiguation of argument structures How to represent exponential ambiguities of argument structures with a feature forest?

28
Application to parsing Applying a feature forest model to disambiguation of argument structures How to represent exponential ambiguities of argument structures with a feature forest? –Argument structures are not trees, but DAGs (including reentrant structures)

29
want ARG1 ARG2 I argue1 1 ARG1 1 fact ARG1 want ARG1 ARG2 I argue2 1 ARG1 1 ARG2 fact Packing argument structures An example including reentrant structures She neglected the fact that I wanted to argue.

30
I Packing argument structures She neglected the fact that I wanted to argue.

31
want ARG1 ARG2 I argue1 1 ARG1 1 Packing argument structures Inactive parts: Argument structures whose arguments are all instantiated She neglected the fact that I wanted to argue. I

32
want ARG1 ARG2 I argue1 1 ARG1 1 Packing argument structures Inactive parts: Argument structures whose arguments are all instantiated Inactive parts are packed into conjunctive nodes She neglected the fact that I wanted to argue. I

33
Packing argument structures Inactive parts: Argument structures whose arguments are all instantiated Inactive parts are packed into conjunctive nodes She neglected the fact that I wanted to argue. I want A1 A2 argue1 I A1 argue1 I

34
want ARG1 ARG2 I argue2 1 ARG1 1 ARG2 ? Packing argument structures Inactive parts: Argument structures whose arguments are all instantiated Inactive parts are packed into conjunctive nodes She neglected the fact that I wanted to argue. I want A1 A2 argue1 I A1 argue1 I

35
Packing argument structures Inactive parts: Argument structures whose arguments are all instantiated Inactive parts are packed into conjunctive nodes She neglected the fact that I wanted to argue. I want A1 A2 argue1 I A1 argue1 I want A1 A2 argue2 I

36
Packing argument structures Inactive parts: Argument structures whose arguments are all instantiated Inactive parts are packed into conjunctive nodes She neglected the fact that I wanted to argue. I want A1 A2 argue1 I A1 argue1 I want A1 A2 argue2 I want ARG1 ARG2 I argue2 1 ARG1 1 ARG2 fact

37
Packing argument structures Inactive parts: Argument structures whose arguments are all instantiated Inactive parts are packed into conjunctive nodes She neglected the fact that I wanted to argue. I want A1 A2 argue1 I A1 argue1 I want A1 A2 argue2 I fact argue2 A1 A2 fact I

38
Packing argument structures Inactive parts: Argument structures whose arguments are all instantiated Inactive parts are packed into conjunctive nodes She neglected the fact that I wanted to argue. I want A1 A2 argue1 I A1 argue1 I want A1 A2 argue2 I fact argue2 A1 A2 fact I A1 want

39
Feature forest representation of argument structures fact A1 want fact argue2 A1 A2 want A1 A2 argue1 I A1 She neglected the fact that I wanted to argue. I argue1 I want A1 A2 argue2 I fact I she neglect A1 A2 fact she Conjunctive nodes correspond to argument structures whose arguments are all instantiated

40
Experiments Grammar: a treebank grammar of HPSG [Miyao and Tsujii, 2003] –Extracted from the Penn Treebank [Marcus et al., 1994] Section Training: Section of the Penn Treebank Test: sentences from Section 22 covered by the grammar Measure: Accuracy of dependencies in argument structures

41
Experiments Features: the combinations of –Surface strings/POS –Labels of dependencies (ARG1, ARG2, …) –Labels of lexical entries (head noun, transitive, …) –Distance Estimation algorithm: Limited-memory BFGS algorithm [Nocedal, 1980] with MAP estimation [Chen & Rosenfeld, 1999]

42
Preliminary results Estimation time: 143 min. Accuracy (precision/recall): exactpartial Baseline48.1 / / 56.2 Unigram77.3 / / 81.3 Feature forest85.5 / / 88.2

43
Conclusion Feature forest models allow the probabilistic modeling of complete structures without exponential explosion The application to syntactic parsing resulted in the high accuracy

44
Ongoing work Refinement of the grammar and tuning of estimation parameters Development of efficient algorithms for best-first/beam search

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google