Presentation is loading. Please wait.

Presentation is loading. Please wait.

Systematic Mismatches Across Annotations Alan Lee and Aravind Joshi Institute for Research in Cognitive Science & Department of Computer and Information.

Similar presentations


Presentation on theme: "Systematic Mismatches Across Annotations Alan Lee and Aravind Joshi Institute for Research in Cognitive Science & Department of Computer and Information."— Presentation transcript:

1 Systematic Mismatches Across Annotations Alan Lee and Aravind Joshi Institute for Research in Cognitive Science & Department of Computer and Information Science, University of Pennsylvania ULA Workshop, U of Colorado, Boulder March 2008

2 Preliminaries… We observe that certain annotated features of the Penn Discourse Treebank 2.0 (PDTB) do not match up neatly with annotations at the syntactic level. What do certain mismatches suggest for linguistic theory? How do we get from syntax to discourse? How does this affect NLP applications?

3 Outline 1.Attribution spans 2.Parallel Connectives 3.AltLex 4.Polarity and Determinacy

4 Outline 1.Attribution spans 2.Parallel Connectives 3.AltLex 4.Polarity and Determinacy

5 Attribution Spans  Relation between agents and abstract objects (discourse relations or their arguments)  Annotation: Text Spans and Four features (source, type, polarity, determinacy). More on the features later. The company says it is talking with several prospects. attribution

6  There have been no orders for the Cray-3 so far, though the company says it is talking with several prospects. Discourse semantics: contrary-to-expectation relation between “there being no orders for the Cray-3” and “there being a possibility of some prospects”. Sentence semantics: contrary-to-expectation relation between “there being no orders for the Cray-3” and “the company saying something”. S SBAR-ADV INS NPVP have been no Orders for the Cray-3 There VP though the company says it is talking With several prospects NP VP V S Discourse arguments Syntactic arguments

7  Although takeover experts said they doubted Mr. Steinberg will make a bid by himself, the application by his Reliance Group Holdings Inc. could signal his interest in helping revive a failed labor-management bid. Discourse semantics: contrary-to-expectation relation between “Mr. Steinberg not making a bid by himself” and “the RGH application signaling his bidding interest”. Sentence semantics: contrary-to-expectation relation between “experts saying something” and “the RGH application signaling Mr. Steinberg’s bidding interest”. SBAR-ADV Although takeover experts said Mr. Steinberg will make a bid by himself the application by his RGH Inc. SBAR IN S NP-SBJ could signal his interest in helping revive a failed labor- management bid NP-SBJ VPMD VP VBNP VBDS VP NP-SBJVP VBD they doubted SBAR

8 Mismatches occur with other relations as well, such as causal relations:  Investors are nervous about the issue because they say the company's ability to meet debt payments is dependent on too many variables, including the sale of assets and the need to mortgage property to retire some existing debt. Discourse semantics: causal relation between “investors being nervous” and “problems with the company’s ability to meet debt payments” Sentence semantics: causal relation between “investors being nervous” and “investors saying something”!

9 How to address mismatch? One possibility - treat attribution as a different layer of structure in discourse. (and also in syntax?) This has the effect of reducing the complexity of the discourse structure.

10 Discourse Graphbank (Wolf & Gibson 2005) 1.Farm prices in October edged up 0.7% from September 2.as raw milk prices continued their rise, 3.the Agriculture Department said. 4.Milk sold to the nation's dairy plants and dealers averaged $14.50 for each hundred pounds, 5.up 50 cents from September and up $1.50 from October 1988, 6.the department said.

11 12 34 5 1-2 4-5 attr sim elab ceelab attr 6 ce - cause/effect; elab - elaboration; sim - similiarity; attr - atribution

12 12 3,attr4 5 1-2 4-5 elab ceelab 6,attr ce - cause/effect; elab - elaboration; [ sim - similiarity; attr - atribution ]

13 Residual issues Even if B.A.T receives approval for the restructuring, the company will remain in play, say shareholders and analysts, though the situation may unfold over the next 12 months, rather than six. Does attribution scope over the entire relation, or just Arg1? Guideline: in case of doubt, attribute to the Writer Arg1: attributed to shareholders and analysts Rel and Arg2: attributed to Writer

14 Attribution cannot always be excluded by default  Advocates said the 90-cent-an-hour rise, to $4.25 an hour by April 1991, is too small for the working poor, while opponents argued that the increase will still hurt small business and cost many thousands of jobs. Residual issues What implications does this have for the approach of treating attribution as an independent layer of discourse?

15 Outline 1.Attribution spans 2.Parallel Connectives 3.AltLex 4.Polarity and Determinacy

16 Parallel Connectives Either he wasn’t being real in the past or he isn’t being real right now. (1549) You’ve either got a chair or you don’t. (2428) If the answers to these questions are affirmative, then these institutional investors are likely to be favorably disposed toward a specific poison pill. (0275)  Parallel connectives are annotated discontinuously  In the PDTB, both parts of a parallel connective are treated as equally prominent (no hierarchical relationship)

17  Either he wasn’t being real in the past or he isn’t being real right now. (wsj_1549) S SSCC Eitherhe wasn’t being real in the past orhe isn’t being real right now In Penn Treebank, the treatment of a parallel connective depends on its position within sentence. When “Either” is sentence-initial, both “either” and “or” are annotated as CC.

18  You’ve either got a chair or you don’t. (wsj_2428) S SSCC You or you don’t This is not possible when “either” is sentence-medial. Here, “either” is treated as an RB and “or” is as a CC. NP-SBJVPADVPVP ‘vegot a chair either RB

19 How to represent parallel connective? DL-TAG approach: elementary discourse tree with two lexical anchors ( D C = discourse clause) Either DCDC or DCDC because DCDC DCDC DCDC DCDC But question remains: how to transition from syntactic structure to discourse structure?

20 Outline 1.Attribution spans 2.Parallel Connectives 3.AltLex 4.Polarity and Determinacy

21 Alternative Lexicalization (AltLex) A discourse relation is inferred between two sentences which do not contain an Explicit connective, but insertion of an Implicit connective leads to redundancy. This is because the relation is alternatively lexicalized by some non-connective expression:  Under a post-1987 crash reform, the Chicago Mercantile Exchange wouldn’t permit the December S&P futures to fall further than 12 points for a half hour. AltLex = (consequence) That caused a brief period of panic seeling of stocks on the Big Board.

22 Discourse Connectives and Syntactic Constituency Most explicit connectives correspond to syntactic constituencies. E.g. (“because” IN, “but” CC, “as a result” PP, etc.) Some small exceptions with parallel connectives, as we have seen.

23 AltLex expressions often do not correspond to syntactic constituencies. Under a post-1987 crash reform, the Chicago Mercantile Exchange wouldn’t permit the December S&P futures to fall further than 12 points for a half hour. AltLex = (consequence) That caused a brief period of panic selling of stocks on the Big Board. S NP-SBJVP VBDDT PP-LOC Thatcauseda brief period of panic selling…..

24 For a list of AltLex expressions annotated in the PDTB: http://www.seas.upenn.edu/~pdtb/altlex-strings.txt Or search using PDTB Browser (shameless plug) : http://www.seas.upenn.edu/~pdtb/PDTBAPI/pdtbbrowser.jnlp

25 Outline 1.Attribution spans 2.Parallel Connectives 3.AltLex 4.Polarity and Determinacy

26 Attribution Features Attribution is annotated on relations and arguments, with FOUR Features. Source: encodes the different agents to whom proposition is attributed Wr: Writer agent Ot: Other non-writer agent Arb: Generic/Atbitrary non-writer agent Inh: Used only for arguments; attribution inherited from relation Type: encodes different types of Abstract Objects Comm: Verbs of communication PAtt: Verbs of propositional attitude Ftv: Factive verbs Ctrl: Control verbs Null: Used only for arguments with no explicit attribution

27 Polarity vs Determinacy Polarity: Indicates narrow scope of surface negated attributions. (Neg-raising, Klima 1964). Marked as “Neg” when neg-raising occurs. “Null” otherwise. John doesn’t think the book fell (> John thinks the book didn’t fall) Determinacy: Attributions rendered indeterminate in certain contexts. Marked as “Indet”, or “Null” otherwise. John didn’t say the book fell (> no lowering of negation) Only a certain class of verbs can have negative polarity, i.e. induce neg-raising. Verbs of Propositional Attitude (PAtt) have this behavior, but not others.

28 Polarity vs Determinacy… I don’t believe they have the culture to adequately service high-net- worth individuals. (0927) Discourse semantics: I believe they DO NOT have the culture to adequately service high- net-worth individuals. (0927) Negation of “expect” is lowered onto the argument. The attribution is marked as negative polarity. Note that the attribution event of “expecting” did occur (is determinate).

29 Polarity vs Determinacy… It didn’t say if it’s earlier results were influenced significantly by nonrecurring elements. (1711) Negation of “say” is NOT lowered onto the argument. The attribution is marked as indeterminate. The attribution event (of “saying”) did not actually occur.

30 At Syntactic Level… At Syntactic Level… At which level should discrepancy in the “polarity” vs “determinacy” type of negation be captured? - In PropBank, negations of attribution verbs are uniformly marked as a negative feature for the adjunct feature “ARGM”. - In TimeML, they contain a polarity feature of “Neg”. I don’t BELIEVE they have the culture to adequately service high-net-worth individuals. ARG1 I ARG2 they have the culture… ARGM Neg (PropBank) No Neg for lower predicate “have” POLARITYNeg (TimeML) Should the negation be marked as ARGM for the lower predicate (“have”) instead?

31 It didn’t SAY if it’s earlier results were influenced significantly by nonrecurring elements. ARG1 It ARG2 if it’s earlier results were influenced significantly by nonrecurring elements ARGMNeg (PropBank) POLARITYNeg (TimeML) “Saying” event is indeterminate. Does this still count as an event? How to order this temporally? At Syntactic Level…

32 Some questions… How much of discourse is “projected” from syntax? Is there a need for a different architecture, different building blocks? How are these issues manifested cross- linguistically? Currently, discourse annotation work being done for Hindi, Turkish, Czech and Finnish (possibly).


Download ppt "Systematic Mismatches Across Annotations Alan Lee and Aravind Joshi Institute for Research in Cognitive Science & Department of Computer and Information."

Similar presentations


Ads by Google