Fast Full Parsing by Linear-Chain Conditional Random Fields Yoshimasa Tsuruoka, Jun’ichi Tsujii, and Sophia Ananiadou The University of Manchester.

Fast Full Parsing by Linear-Chain Conditional Random Fields Yoshimasa Tsuruoka, Jun’ichi Tsujii, and Sophia Ananiadou The University of Manchester

Outline Motivation Parsing algorithm Chunking with conditional random fields Searching for the best parse Experiments Penn Treebank Conclusions

Motivation Parsers are useful in many NLP applications – Information extraction, Summarization, MT, etc. But parsing is often the most computationally expensive component in the NLP pipeline Fast parsing is useful when – The document collection is large – e.g. MEDLINE corpus: 70 million sentences – Real-time processing is required – e.g. web applications

Parsing algorithms History-based approaches – Bottom-up & left-to-right (Ratnaparkhi, 1997) – Shift-reduce (Sagae & Lavie 2006) Global modeling – Tree CRFs (Finkel et al., 2008; Petrov & Klein 2008) – Reranking (Collins 2000; Charniak & Johnson, 2005) – Forest (Huang, 2008)

Chunk parsing Parsing Algorithm 1.Identify phrases in the sequence. 2.Convert the recognized phrases into new non- terminal symbols. 3.Go back to 1. Previous work – Memory-based learning (Tjong Kim Sang, 2001) F-score: 80.49 – Maximum entropy (Tsuruoka and Tsujii, 2005) F-score: 85.9

Parsing a sentence Estimated volume was a light 2.4 million ounces. VBN NN VBD DT JJ CD CD NNS. QP NP VP NP S

Estimated volume was a light 2.4 million ounces. VBN NN VBD DT JJ CD CD NNS. QP NP 1 st iteration

volume was a light million ounces. NP VBD DT JJ QP NNS. NP 2 nd iteration

volume was ounces. NP VBD NP. VP 3 rd iteration

volume was. NP VP. S 4 th iteration

was S 5 th iteration

Estimated volume was a light 2.4 million ounces. VBN NN VBD DT JJ CD CD NNS. QP NP VP NP S Complete parse tree

Chunking with CRFs Conditional random fields (CRFs) Features are defined on states and state transitions Feature function Feature weight Estimated volume was a light 2.4 million ounces. VBN NN VBD DT JJ CD CD NNS. QP NP

Estimated volume was a light 2.4 million ounces. VBN NN VBD DT JJ CD CD NNS. Chunking with “IOB” tagging B-NPI-NPOOOB-QPI-QPOO NPQP B : Beginning of a chunk I : Inside (continuation) of the chunk O : Outside of chunks

Features for base chunking Estimated volume was a light 2.4 million ounces. VBN NN VBD DT JJ CD CD NNS. ?

Features for non-base chunking volume was a light million ounces. NP VBD DT JJ QP NNS. NP VBN NN Estimated volume ?

Finding the best parse Scoring the entire parse tree The best derivation can be found by depth-first search.

Depth first search POS tagging Chunking (base) Chunking Chunking (base) Chunking

Finding the best parse

Extracting multiple hypotheses from CRF A* search – Uses a priority queue – Suitable when top n hypotheses are needed Branch-and-bound – Depth-first – Suitable when a probability threshold is given CRF BIOOOB 0.3 BIIOOB 0.2 BIOOOO 0.18

Experiments Penn Treebank Corpus – Training:sections 2-21 – Development: section 22 – Evaluation:section 23 Training – Three CRF models Part-of-speech tagger Base chunker Non-base chunker – Took 2 days on AMD Opteron 2.2GHz

Training the CRF chunkers Maximum likelihood + L1 regularization L1 regularization helps avoid overfitting and produce compact modes – OWLQN algorithm (Andrew and Gao, 2007)

Chunking performance Symbol# SamplesRecallPrecisonF-score NP317,59794.7994.1694.47 VP76,28191.4691.9891.72 PP66,97992.8492.6192.72 S33,73991.4890.6491.06 ADVP21,68684.2585.8685.05 ADJP14,42277.2778.4677.86 ::::: All579,25392.6392.6292.63 Section 22, all sentences

Beam width and parsing performance BeamRecallPrecisionF-scoreTime (sec) 186.7287.8387.2716 288.5088.8588.6741 388.6989.0888.8861 488.7289.1388.9292 588.7389.1488.93119 1088.6889.1988.93179 Section 22, all sentences (1,700 sentences)

Comparison with other parsers RecallPrec.F-scoreTime (min) This work (deterministic)86.387.586.90.5 This work (beam = 4)88.288.788.41.7 Huang (2008)91.7Unk Finkel et al. (2008)87.888.288.0>250 Petrov & Klein (2008)88.33 Sagae & Lavie (2006)87.888.187.917 Charniak & Johnson (2005)90.691.391.0Unk Charniak (2000)89.689.5 23 Collins (1999)88.188.388.239 Section 23, all sentences (2,416 sentences)

Discussions Improving chunking accuracy – Semi-Markov CRFs (Sarawagi and Cohen, 2004) – Higher order CRFs Increasing the size of training data – Create a treebank by parsing a large number of sentences with an accurate parser – Train the fast parser using the treebank

Conclusion Full parsing by cascaded chunking – Chunking with CRFs – Depth-first search Performance – F-score = 86.9 (12msec/sentence) – F-score = 88.4 (42msec/sentence) Available soon

Fast Full Parsing by Linear-Chain Conditional Random Fields Yoshimasa Tsuruoka, Jun’ichi Tsujii, and Sophia Ananiadou The University of Manchester.

Similar presentations

Presentation on theme: "Fast Full Parsing by Linear-Chain Conditional Random Fields Yoshimasa Tsuruoka, Jun’ichi Tsujii, and Sophia Ananiadou The University of Manchester."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Fast Full Parsing by Linear-Chain Conditional Random Fields Yoshimasa Tsuruoka, Jun’ichi Tsujii, and Sophia Ananiadou The University of Manchester.

Similar presentations

Presentation on theme: "Fast Full Parsing by Linear-Chain Conditional Random Fields Yoshimasa Tsuruoka, Jun’ichi Tsujii, and Sophia Ananiadou The University of Manchester."— Presentation transcript:

Similar presentations

About project

Feedback