Automatically Evaluating Text Coherence Using Discourse Relations Ziheng Lin, Hwee Tou Ng and Min-Yen Kan Department of Computer Science National University.

Slides:



Advertisements
Similar presentations
A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.
Advertisements

Machine Learning Approaches to the Analysis of Large Corpora : A Survey Xunlei Rose Hu and Eric Atwell University of Leeds.
Automatic Timeline Generation from News Articles Josh Taylor and Jessica Jenkins.
Using Syntax to Disambiguate Explicit Discourse Connectives in Text Source: ACL-IJCNLP 2009 Author: Emily Pitler and Ani Nenkova Reporter: Yong-Xiang Chen.
Progress update Lin Ziheng. System overview 2 Components – Connective classifier Features from Pitler and Nenkova (2009): – Connective: because – Self.
Farag Saad i-KNOW 2014 Graz- Austria,
Towards Twitter Context Summarization with User Influence Models Yi Chang et al. WSDM 2013 Hyewon Lim 21 June 2013.
How to Write a Critique. What is a critique?  A critique is a paper that gives a critical assessment of a book or article  A critique is a systematic.
计算机科学与技术学院 Chinese Semantic Role Labeling with Dependency-driven Constituent Parse Tree Structure Hongling Wang, Bukang Wang Guodong Zhou NLP Lab, School.
1 Multi-topic based Query-oriented Summarization Jie Tang *, Limin Yao #, and Dewei Chen * * Dept. of Computer Science and Technology Tsinghua University.
Patch to the Future: Unsupervised Visual Prediction
April 26th, 2007 Workshop on Treebanking, HLT/NAACL, Rochester 1 Layering of Annotations in the Penn Discourse TreeBank (PDTB) Rashmi Prasad Institute.
Recognizing Implicit Discourse Relations in the Penn Discourse Treebank Ziheng Lin, Min-Yen Kan, and Hwee Tou Ng Department of Computer Science National.
Discourse Parsing in the Penn Discourse Treebank: Using Discourse Structures to Model Coherence and Improve User Tasks Ziheng Lin Ph.D. Thesis Proposal.
Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)
Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.
Predicting Text Quality for Scientific Articles AAAI/SIGART-11 Doctoral Consortium Annie Louis : Louis A. and Nenkova A Automatically.
14 April 2005 RST Discourse Corpus Lynn Carlson Daniel Marcu Mary Ellen Okurowski.
1 Discourse Connective Argument Identification With Connective Specific Markers Robert Elwell and Jason Baldridge Alexander Shoulson University of Pennsylvania.
ACL 2011 Debrief Lin Ziheng 1. Portland 2 Pride parade 3.
CS 4705 Lecture 21 Algorithms for Reference Resolution.
Investigation of Web Query Refinement via Topic Analysis and Learning with Personalization Department of Systems Engineering & Engineering Management The.
Top Ten Tips for teachers preparing students for the academic version of IELTS Sam McCarter Macmillan Online Conference 2013.
Michigan Common Core Standards
Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, and Yan He Intelligent Computing Research Center, School of Computer Science and Technology, Harbin.
Mining and Summarizing Customer Reviews
CHAPTER 3: DEVELOPING LITERATURE REVIEW SKILLS
Studying Visual Attention with the Visual Search Paradigm Marc Pomplun Department of Computer Science University of Massachusetts at Boston
UAM CorpusTool: An Overview Debopam Das Discourse Research Group Department of Linguistics Simon Fraser University Feb 5, 2014.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
The CoNLL-2013 Shared Task on Grammatical Error Correction Hwee Tou Ng, Yuanbin Wu, and Christian Hadiwinoto 1 Siew.
A Machine Learning Approach to Sentence Ordering for Multidocument Summarization and Its Evaluation D. Bollegala, N. Okazaki and M. Ishizuka The University.
Semi-supervised Training of Statistical Parsers CMSC Natural Language Processing January 26, 2006.
You Are What You Tag Yi-Ching Huang and Chia-Chuan Hung and Jane Yung-jen Hsu Department of Computer Science and Information Engineering Graduate Institute.
A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,
1 Intelligente Analyse- und Informationssysteme Frank Reichartz, Hannes Korte & Gerhard Paass Fraunhofer IAIS, Sankt Augustin, Germany Dependency Tree.
Minimally Supervised Event Causality Identification Quang Do, Yee Seng, and Dan Roth University of Illinois at Urbana-Champaign 1 EMNLP-2011.
Automatic Evaluation of Linguistic Quality in Multi- Document Summarization Pitler, Louis, Nenkova 2010 Presented by Dan Feblowitz and Jeremy B. Merrill.
Summarizing Encyclopedic Term Descriptions on the Web from Coling 2004 Atsushi Fujii and Tetsuya Ishikawa Graduate School of Library, Information and Media.
Multilingual Opinion Holder Identification Using Author and Authority Viewpoints Yohei Seki, Noriko Kando,Masaki Aono Toyohashi University of Technology.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
By Joseph Cheatle.  Keep one idea to a paragraph  Unity  Single focus  Coherence  Makes the paragraph easily understandable to a reader  Logical.
Automatic sense prediction for implicit discourse relations in text Emily Pitler, Annie Louis, Ani Nenkova University of Pennsylvania ACL 2009.
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
Natural Language Generation with Tree Conditional Random Fields Wei Lu, Hwee Tou Ng, Wee Sun Lee Singapore-MIT Alliance National University of Singapore.
Persuasive Writing Goal: support an argument with textual evidence Identify main idea and supporting details Compare and contrast authors’ argument Identify.
NTU & MSRA Ming-Feng Tsai
Pastra and Saggion, EACL 2003 Colouring Summaries BLEU Katerina Pastra and Horacio Saggion Department of Computer Science, Natural Language Processing.
An evolutionary approach for improving the quality of automatic summaries Constantin Orasan Research Group in Computational Linguistics School of Humanities,
MMM2005The Chinese University of Hong Kong MMM2005 The Chinese University of Hong Kong 1 Video Summarization Using Mutual Reinforcement Principle and Shot.
Incremental Text Structuring with Hierarchical Ranking Erdong Chen Benjamin Snyder Regina Barzilay.
哈工大信息检索研究室 HITIR ’ s Update Summary at TAC2008 Extractive Content Selection Using Evolutionary Manifold-ranking and Spectral Clustering Reporter: Ph.d.
Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.
Progress Update Lin Ziheng 6/26/2016. Outline  Update Summarization  Opinion Summarization  Discourse Analysis 6/26/2016.
 1. optional (check to see if your college requires it)  2. Test Length: 50 min  3. Nature of Prompt: Analyze an argument  4. Prompt is virtually.
University Of Seoul Ubiquitous Sensor Network Lab Query Dependent Pseudo-Relevance Feedback based on Wikipedia 전자전기컴퓨터공학 부 USN 연구실 G
Genre Distinctions in the Penn TreeBank Bonnie Webber (2009) Proceedings of the 47 th Annual Meeting of the ACL and the 4 th IJCNLP of the AFNLP:
Graph-based Dependency Parsing with Bidirectional LSTM Wenhui Wang and Baobao Chang Institute of Computational Linguistics, Peking University.
Language Identification and Part-of-Speech Tagging
Content Selection: Supervision & Discourse
Discourse Mode Identification in Essays
Neural Machine Translation
Sentiment analysis algorithms and applications: A survey
Authorship Attribution Using Probabilistic Context-Free Grammars
Entity- & Topic-Based Information Ordering
Improving a Pipeline Architecture for Shallow Discourse Parsing
Generating Natural Answers by Incorporating Copying and Retrieving Mechanisms in Sequence-to-Sequence Learning Shizhu He, Cao liu, Kang Liu and Jun Zhao.
Topic 2 Computational Approaches to Argumentation Mining in Natural Language Text 11/23/2018 Huy V. Nguyen.
WSExpress: A QoS-Aware Search Engine for Web Services
A Coupled User Clustering Algorithm for Web-based Learning Systems
Presentation transcript:

Automatically Evaluating Text Coherence Using Discourse Relations Ziheng Lin, Hwee Tou Ng and Min-Yen Kan Department of Computer Science National University of Singapore

Introduction Textual coherence  discourse structure Canonical orderings of relations: –Satellite before nucleus –Nucleus before satellite Preferential ordering generalizes to other discourse frameworks 2 Automatically Evaluating Text Coherence Using Discourse Relations Satellite Nucleus Conditional Nucleus Satellite Evidence

Two examples Swapping S1 and S2 without rewording Disturbs intra-relation ordering Contrast-followed-by-Cause is common in text Shuffling these sentences Disturbs inter-relation ordering 3 Automatically Evaluating Text Coherence Using Discourse Relations 1[ Everyone agrees that most of the nation’s old bridges need to be repaired or replaced. ] S1 [ But there’s disagreement over how to do it. ] S2 2[ The Constitution does not expressly give the president such power. ] S1 [ However, the president does have a duty not to violate the Constitution. ] S2 [ The question is whether his only means of defense is the veto. ] S3 Incoherent text S1S2 Contrast Contrast  Cause

Assess coherence with discourse relations Measurable preferences for intra- and inter-relation ordering Key idea: use statistical model of this phenomenon to assess text coherence Propose a model to capture text coherence Based on statistical distribution of discourse relations Focus on relation transitions 4 Automatically Evaluating Text Coherence Using Discourse Relations

Outline Introduction Related work Using discourse relations A refined approach Experiments Analysis and discussion Conclusion 5 Automatically Evaluating Text Coherence Using Discourse Relations

Coherence models Barzilay & Lee (’04) –Domain-dependent HMM model to capture topic shift –Global coherence = overall prob of topic shift across text Barzilay & Lapata (’05, ’08) –Entity-based model to assess local text coherence –Motivated by Centering Theory –Assumption: coherence = sentence-level local entity transitions Captured by an entity grid model Soricut & Marcu (’06), Elsner et al. (’07) –Combined entity-based and HMM-based models: complementary Karamanis (’07) –Tried to integrate discourse relations into Centering-based metric –Not able to obtain improvement 6 Automatically Evaluating Text Coherence Using Discourse Relations

Discourse parsing Penn Discourse Treebank (PDTB) (Prasad et al. ’08) –Provides discourse level annotation on top of PTB –Annotates arguments, relation types, connectives, attributions Recent work in PDTB –Focused on explicit/implicit relation identification –Wellner & Pustejovsky (’07) –Elwell & Baldridge (’08) –Lin et al. (’09) –Pitler et al. (’09) –Pitler & Nenkova (’09) –Lin et al. (’10) –Wang et al. (’10) –... 7 Automatically Evaluating Text Coherence Using Discourse Relations

Outline Introduction Related work Using discourse relations A refined approach Experiments Analysis and discussion Conclusion 8 Automatically Evaluating Text Coherence Using Discourse Relations

Parsing text First apply discourse parsing on the input text –Use our automatic PDTB parser (Lin et al., ’10) –Identifies the relation types and arguments (Arg1 and Arg2) Utilize 4 PDTB level-1 types: Temporal, Contingency, Comparison, Expansion; as well as EntRel and NoRel 9 Automatically Evaluating Text Coherence Using Discourse Relations

First attempt A simple approach: sequence of relation transitions Text (2) can be represented by: Compile a distribution of the n-gram sub-sequences E.g., a bigram for Text (2): Comp  Cont A longer transition: Comp  Exp  Cont  nil  Temp N-grams: Comp  Exp, Exp  Cont  nil, … Build a classifier to distinguish coherent text from incoherent one, based on transition n-grams 10 Automatically Evaluating Text Coherence Using Discourse Relations S1S2S3 CompCont 2[ The Constitution does not expressly give the president such power. ] S1 [ However, the president does have a duty not to violate the Constitution. ] S2 [ The question is whether his only means of defense is the veto. ] S3

Shortcomings Results of our pilot work was poor –< 70% on text ordering ranking Shortcomings of this model: –Short text has short transition sequence Text (1): Comp Text (2): Comp  Cont Sparse features –Models inter-relation preference, but not intra-relation preference Text (1): S1<S2 vs. S2<S1 11 Automatically Evaluating Text Coherence Using Discourse Relations

Outline Introduction Related work Using discourse relations A refined approach Experiments Analysis and discussion Conclusion 12 Automatically Evaluating Text Coherence Using Discourse Relations

An example: an excerpt from wsj_0437 Definition: a term's discourse role is a 2-tuple of when it appears in a discourse relation. –Represent it as RelType.ArgTag E.g., discourse role of ‘cananea’ in the first relation: –Comp.Arg1 13 Automatically Evaluating Text Coherence Using Discourse Relations 3[ Japan normally depends heavily on the Highland Valley and Cananea mines as well as the Bougainville mine in Papua New Guinea. ] S1 [ Recently, Japan has been buying copper elsewhere. ] S2 [ [ But as Highland Valley and Cananea begin operating, ] C3.1 [ they are expected to resume their roles as Japan’s suppliers. ] C3.2 ] S3 [ [ According to Fred Demler, metals economist for Drexel Burnham Lambert, New York, ] C4.1 [ “Highland Valley has already started operating ] C4.2 [ and Cananea is expected to do so soon.” ] C4.3 ] S4 Implicit Comp Explicit Comp Explicit Temp Implicit Exp Explicit Exp

Discourse role matrix Discourse role matrix: represents different discourse roles of the terms across continuous text units –Text units: sentences –Terms: stemmed forms of open class words Expanded set of relation transition patterns Hypothesis: the sequence of discourse role transitions  clues for coherence Discourse role matrix: foundation for computing such role transitions 14 Automatically Evaluating Text Coherence Using Discourse Relations

Discourse role matrix A fragment of the matrix representation of Text (3) A cell C Ti,Sj : discourse roles of term T i in sentence S j C cananea,S3 = {Comp.Arg2, Temp.Arg1, Exp.Arg1} 15 Automatically Evaluating Text Coherence Using Discourse Relations

Sub-sequences as features Compile sub-sequences of discourse role transitions for every term –How the discourse role of a term varies through the text 6 relation types (Temp, Cont, Comp, Exp, EntRel, NoRel) and 2 argument tags (Arg1 and Arg2) –6 x 2 = 12 discourse roles, plus a nil value 16 Automatically Evaluating Text Coherence Using Discourse Relations

Sub-sequence probabilities Compute the probabilities for all sub-sequences E.g., P(Comp.Arg2  Exp.Arg2) = 2/25 = 0.08 Transitions are captured locally per term, probabilities are aggregated globally –Capture distributional differences of sub-sequences in coherent and incoherent texts Barzilay & Lapata (’05): salient and non-salient matrices –Salience based on term frequency 17 Automatically Evaluating Text Coherence Using Discourse Relations

Preference ranking The notion of coherence is relative –Better represented as a ranking problem rather than a classification problem Pairwise ranking: rank a pair of texts, e.g., –Differentiating a text from its permutation –Identifying a more well-written essay from a pair Can be easily generalized to listwise Tool: SVM light –Features: all sub-sequences with length <= n –Values: sub-sequence prob 18 Automatically Evaluating Text Coherence Using Discourse Relations

Outline Introduction Related work Using discourse relations A refined approach Experiments Analysis and discussion Conclusion 19 Automatically Evaluating Text Coherence Using Discourse Relations

Task and data Text ordering ranking (Barzilay & Lapata ’05, Elsner et al. ’07) –Input: a pair of text and its permutation –Output: a decision on which one is more coherent Assumption: the source text is always more coherent than its permutation 20 Automatically Evaluating Text Coherence Using Discourse Relations # times the system correctly chooses the source text Accuracy = total # of test pairs new

Human evaluation 2 key questions about text ordering ranking: 1.To what extent is the assumption that the source text is more coherent than its permutation correct?  Validate the correctness of this synthetic task 2.How well do human perform on this task?  Obtain upper bound for evaluation Randomly select 50 pairs from each of the 3 data sets For each set, assign 2 human subjects to perform the ranking –The subjects are told to identify the source text 21 Automatically Evaluating Text Coherence Using Discourse Relations

Results for human evaluation 1.Subjects’ annotation highly correlates with the gold standard  The assumption is supported 2.Human performance is not perfect  Fair upper bound limits 22 Automatically Evaluating Text Coherence Using Discourse Relations

Evaluation and results Baseline: entity-based model (Barzilay & Lapata ’05) 4 questions to answer: Q1: Does our model outperform the baseline? Q2: How do the different features derived from using relation types, argument tags and salience information affect performance? Q3: Can the combination of the baseline and our model outperform the single models? Q4: How does system performance of these models compare with human performance on the task? 23 Automatically Evaluating Text Coherence Using Discourse Relations

Q1: Does our model outperform the baseline? Type+Arg+Sal: makes use of relation types, argument tags and salience information Significantly outperform baseline on WSJ and Earthquakes (p < 0.01) On Accidents, not significantly different 24 Automatically Evaluating Text Coherence Using Discourse Relations WSJEarthquakesAccidents Baseline Type+Arg+Sal88.06**86.50**89.38 Full model

Q2: How do the different features derived from using relation types, argument tags and salience information affect performance? Delete Type info, e.g., Comp.Arg2 becomes Arg2 Performance drops on Earthquakes and Accidents Delete Arg info, e.g., Comp.Arg2 becomes Comp A large performance drop across all 3 data sets Remove Salience info Also markedly reduces performance 25 Automatically Evaluating Text Coherence Using Discourse Relations WSJEarthquakesAccidents Baseline Type+Arg+Sal88.06**86.50**89.38 Type+Arg+Sal88.28**85.89*87.06 Type+Arg+Sal87.06** Type+Arg+Sal Full model  Support the use of all 3 feature classes

Q3: Can the combination of the baseline and our model outperform the single models? Different aspects: local entity transition vs. discourse relation transition Combined model gives highest performance  2 models are synergistic and complementary  The combined model is linguistically richer 26 Automatically Evaluating Text Coherence Using Discourse Relations WSJEarthquakesAccidents Baseline Type+Arg+Sal88.06**86.50**89.38 Baseline & Type+Arg+Sal 89.25**89.72**91.64** Full model

Q4: How does system performance of these models compare with human performance on the task? Gap between baseline & human: relatively large Gap between full model & human: more acceptable on WSJ and Earthquakes Combined model: error rate significantly reduced 27 Automatically Evaluating Text Coherence Using Discourse Relations WSJEarthquakesAccidents Baseline Type+Arg+Sal Baseline & Type+Arg+Sal Human Full model (-4.29) (-6.41) (-4.07) (-1.94) (-3.50) (-4.62) (-0.75) (-0.28) (-2.36)

Outline Introduction Related work Using discourse relations A refined approach Experiments Analysis and discussion Conclusion 28 Automatically Evaluating Text Coherence Using Discourse Relations

Performance on data sets Performance gaps between data sets Examine the relation/length ratio for source articles The ratio gives an idea how often a sentence participates in discourse relations Ratios correlate with accuracies 29 Automatically Evaluating Text Coherence Using Discourse Relations AccidentsWSJEarthquakes Type+Arg+Sal Acc >88.06 >86.50 Ratio1.22 >1.2 >1.08 # relations in the article Ratio = # sentences in the article

Correctly vs. incorrectly ranked permutations Expect that: when a text contains more level-1 discourse types (Temp, Cont, Comp, Exp), less EntRel and NoRel –Easier to compute how coherent this text is These 4 relations can combine to produce meaningful transitions, e.g., Comp  Cont in Text (2) Compute the relation/length ratio for the 4 level-1 types for permuted texts Ratio: 0.58 for those that are correctly ranked, 0.48 for those that are incorrectly ranked –Hypothesis supported 30 Automatically Evaluating Text Coherence Using Discourse Relations # 4 discourse relations in the article Ratio = # sentences in the article

Revisit Text (2) 3 sentences  5 (source, permutation) pairs Apply the full model on these 5 pairs –Correctly ranks 4 –The failed permutation is A very good clue of coherence: explicit Comp relation between S1 and S2 (signaled by however) –Not retained in the other 4 permutations –Retained in S3<S1<S2  hard to distinguish 31 Automatically Evaluating Text Coherence Using Discourse Relations 2[ The Constitution does not expressly give the president such power. ] S1 [ However, the president does have a duty not to violate the Constitution. ] S2 [ The question is whether his only means of defense is the veto. ] S3 S1S2 Comp however S3 < S1 < S2

Conclusion Coherent texts preferentially follow certain discourse structures –Captured in patterns of relation transitions First demonstrated that simply using the transition sequence does not work well Transition sequence  discourse role matrix Outperforms the entity-based model on the task of text ordering ranking The combined model outperforms single models –Complementary to each other 32 Automatically Evaluating Text Coherence Using Discourse Relations

Backup 33 Automatically Evaluating Text Coherence Using Discourse Relations

Discourse role matrix In fact, each column corresponds to a lexical chain Difference: –Lexical chain: nodes connected by WordNet rel –Matrix: nodes connected by same stemmed form Further typed with discourse relations 34 Automatically Evaluating Text Coherence Using Discourse Relations

Learning curves On WSJ: –Acc. Increases rapidly from 0—2000 –Slowly increases from 2000—8000 –Full model consistently outperforms baseline with a significant gap –Combined model consistently and significantly outperformance the other two On Earthquakes: –Always increase as more data are utilized –Baseline better at the start –Full & combined models catch up at 1000 and 400, and remain consistently better On Accidents: –Full model and baseline do not show difference –Combined model shows significant gap after Automatically Evaluating Text Coherence Using Discourse Relations

Combined model vs human: –Avg error rate reduction against 100%: 9.57% for full model and 26.37% for combined model –Avg error rate reduction against human upper bound: 29% for full model and 73% for combined model 36 Automatically Evaluating Text Coherence Using Discourse Relations