Presentation is loading. Please wait.

Presentation is loading. Please wait.

End-to-End Discourse Parser Evaluation

Similar presentations


Presentation on theme: "End-to-End Discourse Parser Evaluation"— Presentation transcript:

1 End-to-End Discourse Parser Evaluation
Sucheta Ghosh, Sara Tonelli, Giuseppe Riccardi, Richard Johansson Department of Information Engineering and Computer Science University of Trento, Italy

2 Content Introduction Architecture Feature Result Conclusion
Discourse Parser: what + why + how Discourse Parser & Penn Discourse TreeBank (PDTB) Our contribution Architecture Feature Result Conclusion End2End Disc Pars Eval

3 Introduction What: we refer to coherent structured group of sentences or expressions as a discourse Why: discourse structure to represent the meaning of the document How : Process flow: data (discourse) segmentation  discourse parsing  discourse structure Discourse structure includes relations (connective and its arguments ) lexically anchored in the document text Common Data Sources: Rhetorical Structure Tree (RST) & Penn Discourse TreeBank (PDTB )  We used this End2End Disc Pars Eval

4 Examples from PDTB(1) Arg1 -> I never gamble too far. Explicit Connective -> In particular Arg2 -> I quit after one try, whether I win or lose. [EXPANSION ] Each annotated relation includes a connective, two arguments and a sense label of connective Connective occur between two arguments or at the beginning of sentence or inside argument The top-level senses of three-layered hierarchy: TEMPORAL, CONTINGENCY, COMPARISON, EXPANSION End2End Disc Pars Eval

5 (Arg1 italicized, connectives underlined, Arg2 boldfaced)
Examples from PDTB(2) When Mr. Green won a $240,000 verdict in a land condemnation case against the State in June 1983, he says, Judge O’Kicki unexpectedly awarded him an additional $100,000. [TEMPORAL ] As an indicator of the tight grain supply situation in the U.S., market analysts said that late Tuesday the Chinese government, which often buys U.S. grains in quantity, turned instead to Britain to buy 500,000 metric tons of wheat. [COMPARISON ] Since McDonald’s menu prices rose this year, the actual deadline may have been more. [CONTINGENCY ] (Arg1 italicized, connectives underlined, Arg2 boldfaced) End2End Disc Pars Eval

6 PDTB Corpus Statistics
Arg2 always in same sentence as connective 60.9% of the annotated Arg1 in same sentence as connective, 39.1% is in the previous sentence (30.1% adjacent, 9.0% non adjacent) We used this statistic information to establish baseline End2End Disc Pars Eval

7 Our Contribution Developed end-to-end discourse parser to retrieve discourse structure with explicit connective, 2 arg spans starting with text paragraph Evaluation Established system with Gold-standard data (PTB+PDTB) Evaluated with baseline Implemented same method in automated system Improvement of the automated system in terms of applicability Overlapping discourse segmentation technique (+2/-2 window) applied on the complete text Followed chunking strategy for classification The discourse model is a cascaded CRF End2End Disc Pars Eval

8 End-to-End Architecture
Doc Parser Parse_Tree Chunklink By Sabaine Buchholz CoNLL’00 task AddDiscourse Pitler & Nenkova ‘09 Conn. SenseDet. RootExtract +Morpha Morph & All Feat Johansson+ Minnen et al Pruner Arg2 Arg1 End2End Disc Pars Eval

9 Features Features used for Arg1 and Arg2 segmentation and labeling.
F1. Token (T) F2. Sense of Connective (CONN) F3. IOB chain (IOB) F4. PoS tag F5. Lemma (L) F6. Inflection (INFL) F7. Main verb of main clause (MV) F8. Boolean feature for MV (BMV) Additional feature used only for Arg1 F9. Arg2 Labels For more details: Ghosh et al IJCNLP 2011 End2End Disc Pars Eval

10 Features: Arg1 Features used for Arg1 and Arg2 segmentation and labeling. F1. Token (T) F2. Sense of Connective (CONN) F3. IOB chain (IOB) F4. PoS tag F5. Lemma (L) F6. Inflection (INFL) F7. Main verb of main clause (MV) F8. Boolean feature for MV (BMV) Additional feature used only for Arg1 F9. Arg2 Labels For more details: Ghosh et al IJCNLP 2011 End2End Disc Pars Eval

11 Features: Arg2 Features used for Arg1 and Arg2 segmentation and labeling. F1. Token (T) F2. Sense of Connective (CONN) F3. IOB chain (IOB) F4. PoS tag F5. Lemma (L) F6. Inflection (INFL) F7. Main verb of main clause (MV) F8. Boolean feature for MV (BMV) Additional feature used only for Arg1 F9. Arg2 Labels For more details: Ghosh et al IJCNLP 2011 End2End Disc Pars Eval

12 Evaluation & Baseline Metrics: Precision, Recall and F1 measure
Scoring schemes: Exact Match: correct if classified span exactly coincides with gold standard span Baseline (On the basis of statistics given at annotation manual): Arg2: by labeling all tokens of the text span between the connective and the beginning of the next sentence Arg1: by labeling all tokens in the text span from the end of the previous sentence to the connective position; if the connective occurs at the beginning of a sentence, labeling previous sentence. End2End Disc Pars Eval

13 Exact Arg2 Results: Comparison Viewgraph
F1 Baseline 0.53 0.46 0.49 Gold - Standard 0.84 0.74 0.79 Automatic 0.80 0.74 0.77 AutoConn+GoldSPT 0.82 0.70 0.76 GoldConn+AutoSPT 0.76 0.61 0.68 Lightweight(Auto) 0.72 0.56 0.63 End2End Disc Pars Eval

14 Exact Arg1 Results: Comparison Viewgraph
F1 Baseline 0.19 0.19 0.19 Gold - Standard 0.68 0.39 0.49 Automatic 0.63 0.28 0.39 AutoConn+GoldSPT 0.67 0.31 0.43 GoldConn+AutoSPT 0.62 0.31 0.41 Lightweight(Auto) 0.60 0.27 0.37 End2End Disc Pars Eval

15 Features The IOB(Inside-Outside-Begin) chain all constituents on the path between the root note and the current leaf node of the tree. For example IOB chain feature for ``flashed“: I-S/E-VP/E-SBAR/E-S/C-VP , where B-, I-, E- and C- indicate whether the given token is respectively at the beginning, inside, at the end of the constituent, or a single token chunk. End2End Disc Pars Eval

16 Conclusion The Automatic end2end system results nearly same with Gold standard We lead towards a “lightweight” version of the pipeline – shallow & less dependence of SPTs We wish to explore more features We improved our result by 5 points for Arg1 classification using a previous sentence feature (Ghosh et al IJCNLP 2011) End2End Disc Pars Eval

17 Thank you Sucheta Ghosh, Sara Tonelli, Giuseppe Riccardi, Richard Johansson Department of Information Engineering and Computer Science University of Trento, Italy {ghosh, End2End Disc Pars Eval

18 Previous Work Task limited to retrieving the argument heads (Wellner et al 2007, Elwell et al 2008) Dinesh et al. (2005) extracted complete arguments with boundaries, but only for a restricted class of connectives The identification of Arg1 has been only partially addressed in previous works (Prasad 2010) Automatic surface-sense classification (at class level) already reached the upper bound of inter-annotator agreement (Pitler and Nenkova, 2009) End2End Disc Pars Eval

19 Data & Tools Corpus Used: Penn Discourse Tree Bank (PDTB)
For Gold Standard System: Penn Tree Bank (PTB) corpus is used Third party software/scripts used: Stanford Syntactic Tree Parser (by Klein & Manning 2003) AddDiscourse (Explicit Connective Classification) (Pitler and Nenkova 2008) ChunkLink.pl to extract IOB chains (by Sabine Buchholtz: CoNLL Shared Task 2000) RootExtractor: Syntactic Parse Tree (SPT) processors (by Richard Johansson) Morpha (Minnen et al 2001) Conditional Random Field: CRF++ by Taku Kudo End2End Disc Pars Eval

20 Overall Architecture Syntactic tree parser is used for automatic systems Connective Detection and classification tool is used for automatic systems PDTB & PTB are not used during end-to-end automatic testing phase End2End Disc Pars Eval

21 End2End Testing Phase End2End Disc Pars Eval

22 Conditional Random Field
We use the CRF++ tool (http://crfpp.sourceforge.net/) for sequence labeling classification (Lafferty et al., 2001), with second-order Markov dependency between tags. Beside the individual specification of a feature in the feature description template, the features in various combinations are also represented. We used this tool because the output of CRF++ is compatible to CoNLL 2000 chunking shared task, and we view our task as a discourse chunking task. On the other hand, linear-chain CRFs for sequence labeling offer advantages over both generative models like HMMs and classifiers applied at each sequence position. Also Sha and Pereira (2003) claim that, as a single model, CRFs outperform other models for shallow parsing. End2End Disc Pars Eval

23 Hill Climbing Algorithm
function HILL-CLIMBING ( problem) returns a state that is a local maximum current 9— MAKE-NODE(problem.INITIAL-STATE) loop do neighbor highest-valued successor of current if (neighbor.VALUE < current.VALUE) then return current.STATE current 9<— neighbor [Artificial Intelligence: Stuart J. Russel] The hill climbing search algorithm, the most basic local search technique. At each step the current node is replaced by the best neighbor; Here neighbor with the highest VALUE, but if a heuristic cost estimate h is used, we would find the neighbor with the lowest h. Hill climbing is greedy, fast local search We optimized this selected set with feature ablation technique, leaving 1 feature each time End2End Disc Pars Eval

24 Features The IOB(Inside-Outside-Begin) chain corresponds to the syntactic categories of all the constituents on the path between the root note and the current leaf node of the tree. The corresponding feature would be I-S/E-VP/E-SBAR/E-S/C-VP, where B-, I-, E- and C- indicate whether the given token is respectively at the beginning, inside, at the end of the constituent, or a single token chunk. In this case, ``flashed" is at the end of every constituent in the chain, except for the last VP, which dominates one single leaf. End2End Disc Pars Eval

25 Result: Gold-lbl & Auto
P R F1 Arg2 Exact 0.84 0.53 Partial Overlap Arg1 P R F1 Arg2 Exact 0.80 0.74 0.77 Partial 0.91 0.85 0.88 Overlap 0.97 0.92 Arg1 0.64 0.31 0.42 semi 0.76 0.39 0.52 auto 0.84 0.40 0.54 0.63 0.28 full 0.36 0.48 0.83 0.37 0.51 Automatic Sys Output Gold-labeled Sys Output (Baseline result in blue color) End2End Disc Pars Eval

26 Combo Result Auto Conn + Gold SPT Gold Conn + Auto SPT P R F1 Arg2
Exact 0.82 0.70 0.76 Partial 0.93 0.79 0.85 Overlap 0.96 0.83 0.89 Arg1 0.67 0.31 0.43 0.81 0.44 0.57 0.94 0.60 P R F1 Arg2 Exact 0.76 0.61 0.68 Partial 0.91 0.73 0.81 Overlap 0.96 0.77 0.85 Arg1 0.62 0.31 0.41 0.42 0.54 0.87 0.43 0.58 Auto Conn + Gold SPT Gold Conn + Auto SPT End2End Disc Pars Eval

27 Result: replc. IOB chain
F1 Arg2 Exact 0.80 0.74 0.77 Partial 0.91 0.85 0.88 Overlap 0.97 0.92 Arg1 0.65 0.29 0.40 0.43 0.56 0.60 End2End Disc Pars Eval


Download ppt "End-to-End Discourse Parser Evaluation"

Similar presentations


Ads by Google