Discourse Connectives and Their Argument Structure: Annotating a discourse treebank ARAVIND K. JOSHI Department of Computer and Information Science October.

Slides:



Advertisements
Similar presentations
The Brenham Writing Room Created by D. Herring
Advertisements

Using Syntax to Disambiguate Explicit Discourse Connectives in Text Source: ACL-IJCNLP 2009 Author: Emily Pitler and Ani Nenkova Reporter: Yong-Xiang Chen.
Page 1 SRL via Generalized Inference Vasin Punyakanok, Dan Roth, Wen-tau Yih, Dav Zimak, Yuancheng Tu Department of Computer Science University of Illinois.
SEMANTICS.
VIRTUAL BUSINESS RETAILING
Adverbs Words which are used to modify verbs or adjectives are usually referred to as adverbs. For instance, the adverbs in the following sentences are.
Computational Models of Discourse Analysis Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Advancing Sentence Structure
April 26th, 2007 Workshop on Treebanking, HLT/NAACL, Rochester 1 Layering of Annotations in the Penn Discourse TreeBank (PDTB) Rashmi Prasad Institute.
Recognizing Implicit Discourse Relations in the Penn Discourse Treebank Ziheng Lin, Min-Yen Kan, and Hwee Tou Ng Department of Computer Science National.
Says who? On the treatment of speech attributions in discourse structure Gisela Redeker & Markus Egg University of Groningen.
Introduction to treebanks Session 1: 7/08/
Comparative Constructions II
Annotating Expressions of Opinions and Emotions in Language Wiebe, Wilson, Cardie.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Fall 2005-Lecture 9.
1 Annotation Guidelines for the Penn Discourse Treebank Part B Eleni Miltsakaki, Rashmi Prasad, Aravind Joshi, Bonnie Webber.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Fall 2005-Lecture 2.
1 Annotation Guidelines for the Penn Discourse Treebank Part A Eleni Miltsakaki, Rashmi Prasad, Aravind Joshi, Bonnie Webber.
PropBank Martha Palmer University of Colorado. Unified Linguistic Annotation: Merging PropBank, NomBank, TimeBank, Penn Discourse Treebank, Coreference,
Extracting Opinions, Opinion Holders, and Topics Expressed in Online News Media Text Soo-Min Kim and Eduard Hovy USC Information Sciences Institute 4676.
CAS LX 502 Semantics 3a. A formalism for meaning (cont ’ d) 3.2, 3.6.
CHAPTER 3: DEVELOPING LITERATURE REVIEW SKILLS
Kinds of Sentence:. Kinds of Sentences: Sentences can be classified into five categories according to the meaning or function(s). They are:- 1.Assertive.
UAM CorpusTool: An Overview Debopam Das Discourse Research Group Department of Linguistics Simon Fraser University Feb 5, 2014.
Sequencing Your Ideas. Sequencing your ideas In order to help your audience understand, you need to link these ideas together. One of the most important.
PropBank, VerbNet & SemLink Edward Loper. PropBank 1M words of WSJ annotated with predicate- argument structures for verbs. –The location & type of each.
FROM SENTENCE STRUCTURE TO “IMMEDIATE” DISCOURSE STRUCTURE: ANNOTATION OF DISCOURSE CONNECTIVES AND THEIR ARGUMENTS Aravind K. Joshi University of Pennsylvania.
Useful tips © Gerlinde Darlington MEd.Mag.phil..  Introduction  Main part – consisting of a few paragraphs  Conclusion  Remember: poorly structured.
1 Derivatives & Risk Management: Part II Models, valuation and risk management.
An Intelligent Analyzer and Understander of English Yorick Wilks 1975, ACM.
인공지능 연구실 황명진 FSNLP Introduction. 2 The beginning Linguistic science 의 4 부분 –Cognitive side of how human acquire, produce, and understand.
COMPARATIVE CONSTRUCTIONS II #5 - Adverbials. Adverb vs. Adverbial 1. Adverbs: a word that modifies a verb. Many - but not all - adverbs end in -ly. They.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Fall 2005-Lecture 4.
Ideas for 100K Word Data Set for Human and Machine Learning Lori Levin Alon Lavie Jaime Carbonell Language Technologies Institute Carnegie Mellon University.
1 Discourse Connectives and Their Argument Structure: Annotating a discourse treebank ARAVIND K. JOSHI Department of Computer and Information Science August.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Spring 2006-Lecture 6.
Minimally Supervised Event Causality Identification Quang Do, Yee Seng, and Dan Roth University of Illinois at Urbana-Champaign 1 EMNLP-2011.
Combining Lexical Resources: Mapping Between PropBank and VerbNet Edward Loper,Szu-ting Yi, Martha Palmer September 2006.
Rules, Movement, Ambiguity
Topic and the Representation of Discourse Content
#10 The use of conjunctions These are the conjunctions from Grammar Rock… they hook up EQUAL parts (words, phrases, clauses). They are the conjunctions.
Have we had Hard Times or Cosy Times? A Discourse Analysis of Opinions Expressed over Socio-political Events in News Editorials Bal Krishna Bal Information.
Supertagging CMSC Natural Language Processing January 31, 2006.
Syntactic Annotation of Slovene Corpora (SDT, JOS) Nina Ledinek ISJ ZRC SAZU
DiscAn : Towards a Discourse Annotation system for Dutch language corpora or why and how we would want to annotate corpora on the discourse level Ted Sanders.
KINDS OF SENTENCES Arif Suryo Priyatmojo
Multilinugual PennTools that capture parses and predicate-argument structures, for use in Applications Martha Palmer, Aravind Joshi, Mitch Marcus, Mark.
Parser: CFG, BNF Backus-Naur Form is notational variant of Context Free Grammar. Invented to specify syntax of ALGOL in late 1950’s Uses ::= to indicate.
CH 10, Section 3: Balanced budget amendment
Identifying Independent and Dependent Clauses.  An independent clause is a group of words that contains a subject and verb and expresses a complete thought.
Past simple and present perfect Time expressions that refer to the present, such as this morning, this month and today, can be used with either past simple.
Sentence Combining.
How to Fix Problem Sentences Fragments Run-ons Comma Splices.
Week 12. NP movement Text 9.2 & 9.3 English Syntax.
Identifying Expressions of Opinion in Context Eric Breck and Yejin Choi and Claire Cardie IJCAI 2007.
The Brenham Writing Room Created by D. Herring
Natural Language Processing (NLP)
Improving a Pipeline Architecture for Shallow Discourse Parsing
Simple, Compound and Complex Sentences.
The Brenham Writing Room Created by D. Herring
Sentence Structure Basics: Subject and Predicate
The Brenham Writing Room Created by D. Herring
Coordination & Subordination.
Complex Sentences.
Eleni Miltsakaki AUTH Fall 2005-Lecture 6
Natural Language Processing (NLP)
The Brenham Writing Room Created by D. Herring
Information Retrieval
COMPILER CONSTRUCTION
Natural Language Processing (NLP)
Presentation transcript:

Discourse Connectives and Their Argument Structure: Annotating a discourse treebank ARAVIND K. JOSHI Department of Computer and Information Science October

Outline Introduction Some properties of discourse connectives Some example annotations (preliminary) with comments

Introduction Extending the notion of lexical anchors (such as verbs) and their arguments beyond sentences into discourse Discourse connectives such as -- and, or, but, because, since, while, when, however, instead, although, also, for example, then, so that, insofar as, nonetheless, …, Empty Connectives -- they take clauses as their arguments and express relations between clauses, i.e,, relations between propositions, events, situations, … associated with the clauses Towards computing a class of inferences associated with discourse connectives, hence relevant to complex NLP tasks– IE, MT, QA … Towards discourse structure - discourse understanding

Some properties of discourse connectives Discourse connectives have argument structure (analogous to verbs and their argument structure) as in the Propbank. However, there are crucial differences arity of connectives is fixed, they are binary (some apparent exceptions) One argument is in the same sentence in which the connective appears. The other argument may or may not be in the same sentence. It can be in the preceding or following discourse Harder to annotate the extent of an argument one of the arguments can be anaphoric Very little is known about the semantics of discourse connectives

Some properties of discourse connectives Detailed annotation of the argument structure for a large corpus is providing new insights into the semantics of connectives No known abstract semantic categories such as agent, patient, theme, etc. for discourse connectives -- New opportunities At present arguments are labeled by noncommittal labels C c for the clause containing the connective C c’ for the clause not containing the connective Example of semantics: John flunked the exam although he studied hard C c’ although C c ( C c normally entails ~ C c’ ) & C c’

Research Strategy Not shallow vs deep syntactic processing Not shallow vs deep semantic processing But Deeper and deeper shallow processing

Subordinate: because [The federal government suspended sales of U.S. savings Bonds] because [Congress hasn’t lifted the ceiling on government debt.] Adverbial: however [Both Newsweek and U.S. News have been gaining circulation in recent years without heavy use of electronic giveaways to subscribers, such as telephone or watches.] However, [none of the big three weeklies recorded circulation gains recently.] Both arguments are in the same sentence The two arguments in different sentences

Adverbial: for example [The computers were crude by today’s standards.] [Apple II owners, for example, had to use their television| sets as screens and stored data on audiocassetts.] [The computers were crude by today’s standards.] [Apple II owners, for example, had to use their television sets as screens and stored data on audiocassetts.] An argument can be a discontiguous string Problems with aligning arguments with Penn Treebank constituents

Adverbial: instead [No price for the new shares has been set.] Instead, [the companies will leave it up to the marketplace to decide.] “No” is not a part of the left argument Left argument must indicate the unselected alternative and the right argument indicates the selected alternative Negation is the licensing context for the left argument * [Price for the new shares has been set.] Instead, [the companies will leave it up to the marketplace to decide.] Modalities, non-factivity are other licensing contexts John wanted [to go to New York.] Instead, [he went to Washington.]

Adverbial: still [Some senior advisors argue that with further fights over a capital-gains tax cut and a budget-reduction bill Mr. Bush already has enough pending confrontations with congress. They prefer to put off the line-item veto until at least next year.] Still, [Mr. Bush and some other aides are strongly drawn to the idea of trying out a line-item veto.] The left argument has two sentences

Adverbial: also [On the Big Board, Crawford & Co., Atlanta, (CFD) begins trading today.] Crawford evaluates health care plans, manages medical and disability aspects of worker’s compensation injuries and is involved in claims adjustments for insurance companies. Also, [beginning trading today on the Big Board are El Paso Refinery Limited Partnership, El Paso, Texas, (ELP) and Franklin Multi-Income Trust, San Mateo, Calif., (FMI).] The sentence (in blue) after the left argument of “also” can be regarded as a kind of adjunct of the left argument Discourse connectives have a fixed arity (2) and no adjuncts

Empty connective: EMPTY [El Paso owns and operates a petroleum refinery.] EMPTY= whereas [Franklin is a closed-end management investment company.] “whereas” is the connective that one annotator thought best described the relation expressed by the empty connective Analogous to the empty relation in a noun-noun compound at the sentence level

How many discourse connectives in PTB? Types: about 253 (Subordinating: 32, Coordinating: 4, Adverbial/Anaphoric: 217) Tokens: about 23,620 (Subordinating: 7011, Coordinating: 6169, Adverbial/Anaphoric: 10,440) Empty connectives: Tokens: about 20,000 Types: ?? Total: Tokens: 43,620

How PDTB differs existing discourse annotations, such as the RST-annotated corpus (Carlson, Marcu, and Okurowski, 2003, to appear) ? PDTB marks the discourse relations associated with lexical connectives (explicit and implicit), including their argument structure and anaphoric links, thus exposing a clearly defined level of discourse structure The existing RST-annotated corpus contains no record of the basis on which a rhetorical relation is assigned RST is an attempt to provide a very high level annotation leading to low inter-annotator agreement RST corpus in only 1/5 of PTB Relating the two annotations at a later stage will be useful

Project: Annotate discourse connectives and their argument structure for the Penn Treebank corpus Discourse Lexicalized TAG parser (DLTAG) People: Eleni Miltsakaki, Rashmi Prasad, Annotators Aravind Joshi Collaborator: Bonnie Webber (Edinburgh University) Consultants: Mitch Marcus, Martha Palmer, Ellen Prince, Fernando Pereira