Presentation is loading. Please wait.

Presentation is loading. Please wait.

Discourse Connectives and Their Argument Structure: Annotating a discourse treebank ARAVIND K. JOSHI Department of Computer and Information Science October.

Similar presentations


Presentation on theme: "Discourse Connectives and Their Argument Structure: Annotating a discourse treebank ARAVIND K. JOSHI Department of Computer and Information Science October."— Presentation transcript:

1 Discourse Connectives and Their Argument Structure: Annotating a discourse treebank ARAVIND K. JOSHI Department of Computer and Information Science October 30 2003

2 Outline Introduction Some properties of discourse connectives Some example annotations (preliminary) with comments

3 Introduction Extending the notion of lexical anchors (such as verbs) and their arguments beyond sentences into discourse Discourse connectives such as -- and, or, but, because, since, while, when, however, instead, although, also, for example, then, so that, insofar as, nonetheless, …, Empty Connectives -- they take clauses as their arguments and express relations between clauses, i.e,, relations between propositions, events, situations, … associated with the clauses Towards computing a class of inferences associated with discourse connectives, hence relevant to complex NLP tasks– IE, MT, QA … Towards discourse structure - discourse understanding

4 Some properties of discourse connectives Discourse connectives have argument structure (analogous to verbs and their argument structure) as in the Propbank. However, there are crucial differences arity of connectives is fixed, they are binary (some apparent exceptions) One argument is in the same sentence in which the connective appears. The other argument may or may not be in the same sentence. It can be in the preceding or following discourse Harder to annotate the extent of an argument one of the arguments can be anaphoric Very little is known about the semantics of discourse connectives

5 Some properties of discourse connectives Detailed annotation of the argument structure for a large corpus is providing new insights into the semantics of connectives No known abstract semantic categories such as agent, patient, theme, etc. for discourse connectives -- New opportunities At present arguments are labeled by noncommittal labels C c for the clause containing the connective C c’ for the clause not containing the connective Example of semantics: John flunked the exam although he studied hard C c’ although C c ( C c normally entails ~ C c’ ) & C c’

6 Research Strategy Not shallow vs deep syntactic processing Not shallow vs deep semantic processing But Deeper and deeper shallow processing

7 Subordinate: because [The federal government suspended sales of U.S. savings Bonds] because [Congress hasn’t lifted the ceiling on government debt.] Adverbial: however [Both Newsweek and U.S. News have been gaining circulation in recent years without heavy use of electronic giveaways to subscribers, such as telephone or watches.] However, [none of the big three weeklies recorded circulation gains recently.] Both arguments are in the same sentence The two arguments in different sentences

8 Adverbial: for example [The computers were crude by today’s standards.] [Apple II owners, for example, had to use their television| sets as screens and stored data on audiocassetts.] [The computers were crude by today’s standards.] [Apple II owners, for example, had to use their television sets as screens and stored data on audiocassetts.] An argument can be a discontiguous string Problems with aligning arguments with Penn Treebank constituents

9 Adverbial: instead [No price for the new shares has been set.] Instead, [the companies will leave it up to the marketplace to decide.] “No” is not a part of the left argument Left argument must indicate the unselected alternative and the right argument indicates the selected alternative Negation is the licensing context for the left argument * [Price for the new shares has been set.] Instead, [the companies will leave it up to the marketplace to decide.] Modalities, non-factivity are other licensing contexts John wanted [to go to New York.] Instead, [he went to Washington.]

10 Adverbial: still [Some senior advisors argue that with further fights over a capital-gains tax cut and a budget-reduction bill Mr. Bush already has enough pending confrontations with congress. They prefer to put off the line-item veto until at least next year.] Still, [Mr. Bush and some other aides are strongly drawn to the idea of trying out a line-item veto.] The left argument has two sentences

11 Adverbial: also [On the Big Board, Crawford & Co., Atlanta, (CFD) begins trading today.] Crawford evaluates health care plans, manages medical and disability aspects of worker’s compensation injuries and is involved in claims adjustments for insurance companies. Also, [beginning trading today on the Big Board are El Paso Refinery Limited Partnership, El Paso, Texas, (ELP) and Franklin Multi-Income Trust, San Mateo, Calif., (FMI).] The sentence (in blue) after the left argument of “also” can be regarded as a kind of adjunct of the left argument Discourse connectives have a fixed arity (2) and no adjuncts

12 Empty connective: EMPTY [El Paso owns and operates a petroleum refinery.] EMPTY= whereas [Franklin is a closed-end management investment company.] “whereas” is the connective that one annotator thought best described the relation expressed by the empty connective Analogous to the empty relation in a noun-noun compound at the sentence level

13 How many discourse connectives in PTB? Types: about 253 (Subordinating: 32, Coordinating: 4, Adverbial/Anaphoric: 217) Tokens: about 23,620 (Subordinating: 7011, Coordinating: 6169, Adverbial/Anaphoric: 10,440) Empty connectives: Tokens: about 20,000 Types: ?? Total: Tokens: 43,620

14 How PDTB differs existing discourse annotations, such as the RST-annotated corpus (Carlson, Marcu, and Okurowski, 2003, to appear) ? PDTB marks the discourse relations associated with lexical connectives (explicit and implicit), including their argument structure and anaphoric links, thus exposing a clearly defined level of discourse structure The existing RST-annotated corpus contains no record of the basis on which a rhetorical relation is assigned RST is an attempt to provide a very high level annotation leading to low inter-annotator agreement RST corpus in only 1/5 of PTB Relating the two annotations at a later stage will be useful

15 Project: Annotate discourse connectives and their argument structure for the Penn Treebank corpus Discourse Lexicalized TAG parser (DLTAG) People: Eleni Miltsakaki, Rashmi Prasad, Annotators Aravind Joshi Collaborator: Bonnie Webber (Edinburgh University) Consultants: Mitch Marcus, Martha Palmer, Ellen Prince, Fernando Pereira


Download ppt "Discourse Connectives and Their Argument Structure: Annotating a discourse treebank ARAVIND K. JOSHI Department of Computer and Information Science October."

Similar presentations


Ads by Google