1 Discourse Connective Argument Identification With Connective Specific Markers Robert Elwell and Jason Baldridge Alexander Shoulson University of Pennsylvania.

Slides:



Advertisements
Similar presentations
Quick Guide to Commas Wayne State University School of Social Work.
Advertisements

Active Learning with Feedback on Both Features and Instances H. Raghavan, O. Madani and R. Jones Journal of Machine Learning Research 7 (2006) Presented.
Using Syntax to Disambiguate Explicit Discourse Connectives in Text Source: ACL-IJCNLP 2009 Author: Emily Pitler and Ani Nenkova Reporter: Yong-Xiang Chen.
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Cognitive Modelling – An exemplar-based context model Benjamin Moloney Student No:
Specialized models and ranking for coreference resolution Pascal Denis ALPAGE Project Team INRIA Rocquencourt F Le Chesnay, France Jason Baldridge.
Progress update Lin Ziheng. System overview 2 Components – Connective classifier Features from Pitler and Nenkova (2009): – Connective: because – Self.
Automatically Evaluating Text Coherence Using Discourse Relations Ziheng Lin, Hwee Tou Ng and Min-Yen Kan Department of Computer Science National University.
Evaluating Thinking Through Intellectual Standards
In grammar Conjunctions are a part of speech that connects two words, phrases or clauses together.
Project Proposal.
A Joint Model For Semantic Role Labeling Aria Haghighi, Kristina Toutanova, Christopher D. Manning Computer Science Department Stanford University.
Made with OpenOffice.org 1 Sentiment Classification using Word Sub-Sequences and Dependency Sub-Trees Pacific-Asia Knowledge Discovery and Data Mining.
Recognizing Implicit Discourse Relations in the Penn Discourse Treebank Ziheng Lin, Min-Yen Kan, and Hwee Tou Ng Department of Computer Science National.
Predicting Text Quality for Scientific Articles AAAI/SIGART-11 Doctoral Consortium Annie Louis : Louis A. and Nenkova A Automatically.
CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 11: Probabilistic Information Retrieval.
Learning syntactic patterns for automatic hypernym discovery Rion Snow, Daniel Jurafsky and Andrew Y. Ng Prepared by Ang Sun
Seven Lectures on Statistical Parsing Christopher Manning LSA Linguistic Institute 2007 LSA 354 Lecture 7.
Finding Advertising Keywords on Web Pages Scott Wen-tau YihJoshua Goodman Microsoft Research Vitor R. Carvalho Carnegie Mellon University.
SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.
Evaluating Classifiers
Independent vs. Subordinate
Authors: Ting Wang, Yaoyong Li, Kalina Bontcheva, Hamish Cunningham, Ji Wang Presented by: Khalifeh Al-Jadda Automatic Extraction of Hierarchical Relations.
A Compositional Context Sensitive Multi-document Summarizer: Exploring the Factors That Influence Summarization Ani Nenkova, Stanford University Lucy Vanderwende,
1 Time & Cost Sensitive Data-Intensive Computing on Hybrid Clouds Tekin Bicer David ChiuGagan Agrawal Department of Compute Science and Engineering The.
Titling Your Essay How to Create a Catchy, but Informative, Title for an Analytical Essay.
Higher Essay Feedback Good written style Thorough understanding of the topic Good use of HOWEVER…. Thoughtful – clear signs that you’ve thought about structure.
{ The writing process Welcome. In the prewriting stage the follow must be considered:   factual information pertaining to topic   clear definition.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
Natural Language Based Reformulation Resource and Web Exploitation for Question Answering Ulf Hermjakob, Abdessamad Echihabi, Daniel Marcu University of.
Improving Subcategorization Acquisition using Word Sense Disambiguation Anna Korhonen and Judith Preiss University of Cambridge, Computer Laboratory 15.
Opinion Holders in Opinion Text from Online Newspapers Youngho Kim, Yuchul Jung and Sung-Hyon Myaeng Reporter: Chia-Ying Lee Advisor: Prof. Hsin-Hsi Chen.
Higher English Close Reading Types of Questions Understanding Questions Tuesday 8 OctoberCMCM1.
Copyright © 2010 Pearson Education, Inc. Chapter 19 Confidence Intervals for Proportions.
Minimally Supervised Event Causality Identification Quang Do, Yee Seng, and Dan Roth University of Illinois at Urbana-Champaign 1 EMNLP-2011.
Maximum Entropy Models and Feature Engineering CSCI-GA.2590 – Lecture 6B Ralph Grishman NYU.
Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.
CHAPTER 6 Naive Bayes Models for Classification. QUESTION????
Transitions... in your essay.
Recognizing Stances in Online Debates Unsupervised opinion analysis method for debate-side classification. Mine the web to learn associations that are.
Unit 8 Syntax. Syntax Syntax deals with rules for combining words into sentences, as well as with relationship between elements in one sentence Basic.
Persuasive Writing. We are learning to:  Identify and write/assess our persuasive essays What we are looking for today  able to consider both sides.
Generating Query Substitutions Alicia Wood. What is the problem to be solved?
Writing a Literature Review. What is a "Literature Review?" Gives the reader the sense that you have examined the topic are familiar with contrasting.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Feature Assignment LBSC 878 February 22, 1999 Douglas W. Oard and Dagobert Soergel.
Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.
ANAPHORA RESOLUTION SYSTEM FOR NATURAL LANGUAGE REQUIREMENTS DOCUMENT IN KOREAN 課程 : 自然語言與應用 課程老師 : 顏國郎 報告者 : 鄭冠瑀.
Ch 18: conjunctions. Function: connect words, phrases, and clauses They do not all function the same way Categories: – Coordinating conjunctions – Conjunctions.
 Your life will be filled with many special occasions – club events, family celebrations, reunions, organization meetings, sports, events, graduations,
Feasibility of Using Machine Learning Algorithms to Determine Future Price Points of Stocks By: Alexander Dumont.
PRESUPPOSITION PRESENTED BY: SUHAEMI.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture 15: Text Classification & Naive Bayes 1.
 Every word matters. Generally, all the words you put in the query will be used.  Search is always case insensitive. A search for [ new york times ]
Sentence Variety!! Combining sentences Simple, Compound, complex and compound/complex sentences.
How to Create a Catchy, but Informative, Title for an Analytical Essay
NOUN CLAUSES A noun clause is a group of words used as a noun
Maximum Entropy Models and Feature Engineering CSCI-GA.2591
Rule Induction for Classification Using
Improving a Pipeline Architecture for Shallow Discourse Parsing
Lecture 15: Text Classification & Naive Bayes
Conjunctions.
Writing Complex Sentences
CS246: Information Retrieval
Retrieval Performance Evaluation - Measures
Jones and Davis’s Correspondent Inference Theory
Sentence Types.
Validity and Soundness, Again
Presentation transcript:

1 Discourse Connective Argument Identification With Connective Specific Markers Robert Elwell and Jason Baldridge Alexander Shoulson University of Pennsylvania Feb. 26, 2011

2 Overview ► Introduction ► Problem Motivation ► Model ► Type-based Approach ► Feature Engineering ► Results and Analysis ► Conclusions

3 Introduction ► Problem is to identify ARG1 and ARG2 ► Uses PDTB 1.0 ► Already given gold-standard connectives ► Follows Wellner and Pustejovsky (W&P) (2007) ► Their work uses maximum-entry rankers, and treat every connective as the same class ► E&B argue that this causes conflicting information ► Propose treating classes of connectives differently ► One model for each

4 Which classes? (1/3) ► Subordinating conjunctions Drug makers shouldn’t be able to duck liability because people couldn’t identify precisely which identical drug was used.

5 Which classes? (2/3) ► Subordinating conjunctions ► Coordinating conjunctions Choose 203 business executives, including, perhaps, someone from your own staff, and put them out on the streets, to be deprived for one month of their homes, families, and income.

6 Which classes? (3/3) ► Subordinating conjunctions ► Coordinating conjunctions ► Adverbial connectives France’s second-largest government-owned insurance company, Assurances Generales de France, is building its own Navigation Mixte stake, currently thought to be between 8% and 10%. Analyists said they don’t think it is contemplating a takeover, however, and its officials couldn’t be reached.

7 Class Relevance (1/2) ► Why split by classes? ► Different classes behave differently ► Adverbials, for instance, prefer more distant arguments ► Why split by these classes? ► Behavior based on syntactic type (Knott 1996) ► Subordinating conjunctions, coordinating conjunctions, discourse adverbials, prepositional phrases, phrases taking sentence complements ► Only some connectives have structural links to their arguments, while others have anaphoric links

8 Class Relevance (2/2) ► Examples of connectives of each class: CoordinatingSubordinatingOther and or but yet then because when since even though except when afterwards previously nonetheless actually again

9 Exploiting PDTB (1/2) ► Overlapping arguments/connectives John loves Barolo. He ordered three cases of the ’97. But he had to cancel the order because then he discovered he was broke.

10 Exploiting PDTB (2/2) ► Why does this matter? ► Can use for feature engineering ► Include features that state: ► Previous and following connectives ► Whether or not there is an overlap of candidates

11 Model Overview (1/2) ► Two Stages ► Identify heads of candidate arguments ► Select the best candidate ► Some restrictions: ► Select candidates (according to W&P) only within ten steps of the connective ► Stay within the same sentence as connective for ARG2

12 Model Overview (2/2) ► Use Maximum Entropy Ranker ► Why Maximum Entropy? ► Accurate ► No independence assumption (a la Naïve Bayes) ► Good for overlapping features ► Fast to train (just a bunch of weights) ► Why a ranker as opposed to classifier? ► Classifiers identify all likely candidates ► No indication of which candidates are better ► Rankers give a likelihood for every candidate ► Can select the most likely

13 Ranking Model Formula

14 Three Core Sets of Models (1/3) ► Generalized Connective Model ► Treats all of the connectives as the same class ► Same as W&P ► We will refer to this as GC

15 Three Core Sets of Models (2/3) ► Generalized Connective Model (GC) ► Connective-Specific Models ► Train one model for each connective ► Captures nuanced word-specific information ► Comes at the cost of data (connectives are sparse) ► Could have unseen connectives during testing ► Backoff to the GC model ► Refer to this model as SC

16 Three Core Sets of Models (3/3) ► Generalized Connective Model (GC) ► Connective-Specific Models (SC) ► Type-Specific Models ► Uses the three types discussed previously ► Subordinating ► Coordinating ► Adverbial ► Determine connective types a priori from a dictionary ► Refer to these models as TC

17 Three Core Sets of Models (3/3) ► Generalized Connective Model (GC) ► Connective-Specific Models (SC) ► Type-Specific Models (TC)

18 ► Use weights to combine all three core model types ► Two steps: ► Combine TC and GC (to TG) ► Combine SC with TG (to SGT) Model Interpolation (1/3) Recall: GC – One model overall (General connective) TC – One model per “type” (Type-specific) SC – One model per connective (Connective-specific)

19 Model Interpolation (2/3)

20 Model Interpolation (3/3)

21 ► Use all features from W&P, plus the following: Feature Engineering (1/4)

22 ► These introduce a higher level of context sensitivity ► Allow inference based on surrounding connectives Feature Engineering (2/4)

23 ► In particular, these deal with discourse in quotes ► As opposed to the scope of the entire document ► Offer some degree of attribution detection Feature Engineering (3/4)

24 ► Introduces features from the morpha stemmer ► Discourages selection of head words that are immediate constituents of other connectives Feature Engineering (4/4)

25 ► Accuracy on gold-standard parses: ► Observe TC and SC outperform GC ► GC-ALL = W&P-Base + New Features ► W&P Reranks by combined likehood of ARG1 and ARG2 ► This model does not (as presented), but could Results (1/3)

26 ► Accuracy on gold-standard parses: ► Observe TC and SC outperform GC ► GC-ALL = W&P-Base + New Features ► W&P Reranks by combined likehood of ARG1 and ARG2 ► This model does not (as presented), but could Results (1/3)

27 ► Accuracy on gold-standard parses: ► Observe TC and SC outperform GC ► GC-ALL = W&P-Base + New Features ► W&P Reranks by combined likehood of ARG1 and ARG2 ► This model does not (as presented), but could Results (1/3) Suggests that different feature sets might be useful for ARG1 and ARG2

28 ► Accuracy on gold-standard parses: ► Observe TC and SC outperform GC ► GC-ALL = W&P-Base + New Features ► W&P Reranks by combined likehood of ARG1 and ARG2 ► This model does not (as presented), but could Results (1/3) TC-ALL is still too coarse-grained for ARG2

29 ► Accuracy comparison by parse source Results (2/3) Bikel ParserGold Standard

30 ► Accuracy comparison by parse source ► This model doesn’t suffer as much as W&P ► Possibly because features are less dependent on syntax Results (2/3) Bikel ParserGold Standard

31 ► Accuracy on gold-standard parses by type: Results (3/3)

32 ► Accuracy on gold-standard parses by type: ► This 67.5 stands out. Why is it so high? ► Adverbials are tricky ► Sparse, but don’t behave like the two other types ► GC treats them too much like subordinating and coordinating ► SC doesn’t have enough data for each adverbial ► Usually one each per document ► So treating them as a class yields the best results Results (3/3)

33 ► Improvement over W&P (77.8% vs. 74.2%) ► How? ► Richer model ► Interpolation of different connective classes ► Exploit qualities unique to each class and connective ► Additional features ► Morpology (morpha stemmer) ► Context sensitivity ► Awareness of other connectives ► Ways to further improve? ► Reranking a la W&P ► Different feature sets for ARG1 and ARG2 ► Use PDTB 2.0 Conclusions

34 References Robert Elwell and Jason Baldridge Discourse Connective Argument Identification with Connective Specific Rankers. In Proceedings of the 2008 IEEE International Conference on Semantic Computing (ICSC '08). IEEE Computer Society, Washington, DC, USA, DOI= /ICSC