Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fast Full Parsing by Linear-Chain Conditional Random Fields Yoshimasa Tsuruoka, Jun’ichi Tsujii, and Sophia Ananiadou The University of Manchester.

Similar presentations


Presentation on theme: "Fast Full Parsing by Linear-Chain Conditional Random Fields Yoshimasa Tsuruoka, Jun’ichi Tsujii, and Sophia Ananiadou The University of Manchester."— Presentation transcript:

1 Fast Full Parsing by Linear-Chain Conditional Random Fields Yoshimasa Tsuruoka, Jun’ichi Tsujii, and Sophia Ananiadou The University of Manchester

2 Outline Motivation Parsing algorithm Chunking with conditional random fields Searching for the best parse Experiments Penn Treebank Conclusions

3 Motivation Parsers are useful in many NLP applications – Information extraction, Summarization, MT, etc. But parsing is often the most computationally expensive component in the NLP pipeline Fast parsing is useful when – The document collection is large – e.g. MEDLINE corpus: 70 million sentences – Real-time processing is required – e.g. web applications

4 Parsing algorithms History-based approaches – Bottom-up & left-to-right (Ratnaparkhi, 1997) – Shift-reduce (Sagae & Lavie 2006) Global modeling – Tree CRFs (Finkel et al., 2008; Petrov & Klein 2008) – Reranking (Collins 2000; Charniak & Johnson, 2005) – Forest (Huang, 2008)

5 Chunk parsing Parsing Algorithm 1.Identify phrases in the sequence. 2.Convert the recognized phrases into new non- terminal symbols. 3.Go back to 1. Previous work – Memory-based learning (Tjong Kim Sang, 2001) F-score: 80.49 – Maximum entropy (Tsuruoka and Tsujii, 2005) F-score: 85.9

6 Parsing a sentence Estimated volume was a light 2.4 million ounces. VBN NN VBD DT JJ CD CD NNS. QP NP VP NP S

7 Estimated volume was a light 2.4 million ounces. VBN NN VBD DT JJ CD CD NNS. QP NP 1 st iteration

8 volume was a light million ounces. NP VBD DT JJ QP NNS. NP 2 nd iteration

9 volume was ounces. NP VBD NP. VP 3 rd iteration

10 volume was. NP VP. S 4 th iteration

11 was S 5 th iteration

12 Estimated volume was a light 2.4 million ounces. VBN NN VBD DT JJ CD CD NNS. QP NP VP NP S Complete parse tree

13 Chunking with CRFs Conditional random fields (CRFs) Features are defined on states and state transitions Feature function Feature weight Estimated volume was a light 2.4 million ounces. VBN NN VBD DT JJ CD CD NNS. QP NP

14 Estimated volume was a light 2.4 million ounces. VBN NN VBD DT JJ CD CD NNS. Chunking with “IOB” tagging B-NPI-NPOOOB-QPI-QPOO NPQP B : Beginning of a chunk I : Inside (continuation) of the chunk O : Outside of chunks

15 Features for base chunking Estimated volume was a light 2.4 million ounces. VBN NN VBD DT JJ CD CD NNS. ?

16 Features for non-base chunking volume was a light million ounces. NP VBD DT JJ QP NNS. NP VBN NN Estimated volume ?

17 Finding the best parse Scoring the entire parse tree The best derivation can be found by depth-first search.

18 Depth first search POS tagging Chunking (base) Chunking Chunking (base) Chunking

19 Finding the best parse

20 Extracting multiple hypotheses from CRF A* search – Uses a priority queue – Suitable when top n hypotheses are needed Branch-and-bound – Depth-first – Suitable when a probability threshold is given CRF BIOOOB 0.3 BIIOOB 0.2 BIOOOO 0.18

21 Experiments Penn Treebank Corpus – Training:sections 2-21 – Development: section 22 – Evaluation:section 23 Training – Three CRF models Part-of-speech tagger Base chunker Non-base chunker – Took 2 days on AMD Opteron 2.2GHz

22 Training the CRF chunkers Maximum likelihood + L1 regularization L1 regularization helps avoid overfitting and produce compact modes – OWLQN algorithm (Andrew and Gao, 2007)

23 Chunking performance Symbol# SamplesRecallPrecisonF-score NP317,59794.7994.1694.47 VP76,28191.4691.9891.72 PP66,97992.8492.6192.72 S33,73991.4890.6491.06 ADVP21,68684.2585.8685.05 ADJP14,42277.2778.4677.86 ::::: All579,25392.6392.6292.63 Section 22, all sentences

24 Beam width and parsing performance BeamRecallPrecisionF-scoreTime (sec) 186.7287.8387.2716 288.5088.8588.6741 388.6989.0888.8861 488.7289.1388.9292 588.7389.1488.93119 1088.6889.1988.93179 Section 22, all sentences (1,700 sentences)

25 Comparison with other parsers RecallPrec.F-scoreTime (min) This work (deterministic)86.387.586.90.5 This work (beam = 4)88.288.788.41.7 Huang (2008)91.7Unk Finkel et al. (2008)87.888.288.0>250 Petrov & Klein (2008)88.33 Sagae & Lavie (2006)87.888.187.917 Charniak & Johnson (2005)90.691.391.0Unk Charniak (2000)89.689.5 23 Collins (1999)88.188.388.239 Section 23, all sentences (2,416 sentences)

26 Discussions Improving chunking accuracy – Semi-Markov CRFs (Sarawagi and Cohen, 2004) – Higher order CRFs Increasing the size of training data – Create a treebank by parsing a large number of sentences with an accurate parser – Train the fast parser using the treebank

27 Conclusion Full parsing by cascaded chunking – Chunking with CRFs – Depth-first search Performance – F-score = 86.9 (12msec/sentence) – F-score = 88.4 (42msec/sentence) Available soon


Download ppt "Fast Full Parsing by Linear-Chain Conditional Random Fields Yoshimasa Tsuruoka, Jun’ichi Tsujii, and Sophia Ananiadou The University of Manchester."

Similar presentations


Ads by Google