Presentation is loading. Please wait.

Presentation is loading. Please wait.

PROBABILISTIC GRAPH-BASED DEPENDENCY PARSING WITH CONVOLUTIONAL NEURAL NETWORK Zhisong Zhang, Hai Zhao and Lianhui QIN Shanghai Jiao Tong University

Similar presentations


Presentation on theme: "PROBABILISTIC GRAPH-BASED DEPENDENCY PARSING WITH CONVOLUTIONAL NEURAL NETWORK Zhisong Zhang, Hai Zhao and Lianhui QIN Shanghai Jiao Tong University"— Presentation transcript:

1 PROBABILISTIC GRAPH-BASED DEPENDENCY PARSING WITH CONVOLUTIONAL NEURAL NETWORK Zhisong Zhang, Hai Zhao and Lianhui QIN Shanghai Jiao Tong University zzs2011@sjtu.edu.cn, zhaohai@cs.sjtu.edu.cn, qinlianhui@sjtu.edu.cn

2 Outline  Background: Dependency Parsing  Training Criteria: Probabilistic Criterion  Neural Parsing: Basic, Convolution, Ensemble  Experiments and Results

3 Background  Dependency parsing aims to predict a dependency tree, in which all the edges connect head-modifier pairs.  In graph-based methods, a dependency tree is factored into sub-trees, The score for a dependency tree (T) is defined as the sum of the scores of all its factors (p).

4 Decoding Algorithms  Chart-based dynamic programming algorithms (for projective parsing).  Explored extensively in previous work (Eisner, 1996; McDonald et al., 2005; McDonald and Pereira, 2006; Koo and Collins, 2010; Ma and Zhao, 2012).

5 Decoding Algorithm Third-order Grand-Sibling Model (the most complex one which we explore).

6 Highlights  Probabilistic criteria for neural network training.  Sentence-level representation learned from a convolutional layer.  Ensemble models with a stacked linear output layer.

7 Probabilistic Model  As in log-linear models like Conditional Random Field (CRF) (Lafferty et al., 2001), we can treat it in a probabilistic way.  It is not new, and has been explored in many previous work.

8 Training Criteria

9 Again DP  How to calculate the marginal probability?  Also solved by dynamic programming, a variant of the well-known inside-outside algorithm, (Paskin, 2001; Ma and Zhao, 2015) provide the corresponding algorithms for dependency parsing.

10 MLE and Max-Margin  probabilistic criterion can be viewed as a soft version of the max-margin criterion.  Gimpel and Smith (2010) provide a good review of several training criteria

11 Neural Parsing: Recent Work  (Durrett and Klein, 2015): Neural-CRF parsing (phrase-based parsing).  (Pei et al., 2015): Graph Parsing with feed-forward NN.  (Weiss et al., 2015): Transition Parsing with structured training.  (Dyer et al., 2015): LSTM Transition Parsing.  And many others …

12 Neural Model: Basic  A simple feed-forward neural network with a window-based approach.

13 Neural Model: Convolutional Model  To encode sentence-level information and obtain sentence embeddings, a convolutional layer of the whole sentence followed by a max-pooling layer is adopted.  The scheme is to use the distance embedding for the whole convolution window as the position feature.

14 Neural Model: Convolutional Model

15 Neural Model: Ensemble Models  The ensemble method of different order models for scoring.  Scheme 1: Simple adding

16 Neural Model: Ensemble Models  Scheme 2: Stacking Another Layer

17 Experiments  English Penn Treebank (PTB) (Three converters) 1. Penn2Malt and the head rules of Yamada and Matsumoto (2003), noted as PTB-Y&M 2. Stanford parser v3.3.0 with Stanford Basic Dependencies (De Marneffe et al., 2006), noted as PTB-SD 3. LTH Constituent-to-Dependency Conversion Tool (Johansson and Nugues, 2007), noted as PTB-LTH  Chinese Penn Treebank (CTB) using the Penn2Malt converter.

18 Model Analysis  To verify the effectiveness of the proposed methods and only the PTB-SD development set will be used in these experiments.

19 Model Analysis: on Dependency Length

20 Main Results

21 References  Marie-Catherine De Marneffe, Bill MacCartney, Christopher D Manning, et al. 2006. Generating typed dependency parses from phrase structure parses. In Proceedings of LREC, volume 6, pages 449–454.  Greg Durrett and Dan Klein. 2015. Neural crf parsing. In Proceedings of ACL, pages 302–312, Beijing, China, July.  Chris Dyer, Miguel Ballesteros, Wang Ling, Austin Matthews, and Noah A. Smith. 2015. Transitionbased dependency parsing with stack long shortterm memory. In Proceedings of ACL, pages 334– 343, Beijing, China, July.  Jason M. Eisner. 1996. Three new probabilistic models for dependency parsing: An exploration. In Proceedings of the 16th International Conference on Computational Linguistics, pages 340–345, Copenhagen, August.  Kevin Gimpel and Noah A. Smith. 2010. Softmaxmargin crfs: Training log-linear models with cost functions. In Proceedings of NAACL, pages 733– 736, Los  Richard Johansson and Pierre Nugues. 2007. Extended constituent-to-dependency conversion for english. In 16th Nordic Conference of Computational Linguistics, pages 105–112. University of Tartu. Angeles, California, June. Implementation for reference: https://github.com/zzsfornlp/nnpgdparser

22 References  Terry Koo and Michael Collins. 2010. Efficient thirdorder dependency parsers. In Proceedings of ACL, pages 1–11, Uppsala, Sweden, July.  John Lafferty, Andrew McCallum, and Fernando CN Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data.  Xuezhe Ma and Hai Zhao. 2012. Fourth-order dependency parsing. In Proceedings of COLING, pages 785–796, Mumbai, India, December.  Xuezhe Ma and Hai Zhao. 2015. Probabilistic models for high-order projective dependency parsing. arXiv preprint arXiv:1502.04174.  Ryan McDonald, Koby Crammer, and Fernando Pereira. 2005. Online large-margin training of dependency parsers. In Proceedings ofACL, pages 91– 98, Ann Arbor, Michigan, June.  MarkA Paskin. 2001. Cubic-time parsing and learning algorithms for grammatical bigram models. Technical report.  Wenzhe Pei, Tao Ge, and Baobao Chang. 2015. An effective neural network model for graph-based dependency parsing. In Proceedings of ACL, pages 313–322, Beijing, China, July.  David Weiss, Chris Alberti, Michael Collins, and Slav Petrov. 2015. Structured training for neural network transition-based parsing. In Proceedings of ACL, pages 323–333, Beijing, China, July.

23 Thanks … Q & A


Download ppt "PROBABILISTIC GRAPH-BASED DEPENDENCY PARSING WITH CONVOLUTIONAL NEURAL NETWORK Zhisong Zhang, Hai Zhao and Lianhui QIN Shanghai Jiao Tong University"

Similar presentations


Ads by Google