Presentation is loading. Please wait.

Presentation is loading. Please wait.

FATE: a FrameNet Annotated corpus for Textual Entailment Marco Pennacchiotti, Aljoscha Burchardt Computerlinguistik Saarland University, Germany LREC 2008,

Similar presentations


Presentation on theme: "FATE: a FrameNet Annotated corpus for Textual Entailment Marco Pennacchiotti, Aljoscha Burchardt Computerlinguistik Saarland University, Germany LREC 2008,"— Presentation transcript:

1 FATE: a FrameNet Annotated corpus for Textual Entailment Marco Pennacchiotti, Aljoscha Burchardt Computerlinguistik Saarland University, Germany LREC 2008, Marrakech, 28 May 2008 SALSA II - The Saarbrücken Lexical Semantics Acquisition Project

2 Summary FrameNet and Textual Entailment FATE annotation schema Annotation examples and statistics Conclusions 28/05/20082 / 17FATE - Marco Pennacchiotti

3 Frame Semantics Frame: conceptual structure modeling a prototypical situation Frame Elements (FE): participants of the situation Frame Evoking elements (FEE): predicates evoking the situation [Fillmore 1976, 2003] 28/05/20083 / 17FATE - Marco Pennacchiotti Predicate-argument level normalizations FrameNet Berkeley Project 1 – Database of frames for the core lexicon of English – 800 frames, 10.000 lemmas, 135.000 annotated sentences (1) http://framenet.icsi.berkeley.edu “Evelyn spoke about her past” “Evelyn’s statement about her past” STATEMENT( S PEAKER : Evelyn; T OPIC : her past )

4 Textual Entailment (TE) Given two text fragments, the Text T and the Hypothesis H, T entails H if the meaning of H can be inferred from the meaning of T, as would typically interpreted by people [Dagan 2005] Given two text fragments, the Text T and the Hypothesis H, T entails H if the meaning of H can be inferred from the meaning of T, as would typically interpreted by people [Dagan 2005] T: “Yahoo has recently acquired Overture” H: “Yahoo owns Overture” T  H Recognizing Textual Entailment (RTE) – recognize if entailment holds for a given (T,H) pair – Models core inferences of many NLP applications (QA, IE, MT,…) RTE Challenges [Dagan et al.,2005 ; Giampiccolo et al., 2007] – Compare systems for RTE – Corpus: 800 training pairs, 800 test pairs, evenly split in + and - pairs 28/05/20084 / 17FATE - Marco Pennacchiotti

5 Predicate-argument and RTE Predicate-level inference plays a relevant role in TE (20% of positive examples in RTE-2 [Garoufi, 2007] ) An avalanche has struck a popular skiing resort in Austria, killing at least 11 people. Humans died in an avalanche. Implementation gap : [Burchardt et al.,2007] : FrameNet system comparable to lexical overlap [Hickl et al.,2006] : PropBank-based features are not effective [Rana et al.,2005]: DIRT paraphrase repository does not help 28/05/20085 / 17FATE - Marco Pennacchiotti DEATH( P ROTAGONIST : 11 people / humans ; C AUSE : avalanche / avalanche )

6 FATE corpus Reference corpus: RTE-2 test set, 800 pairs, 29,000 tokens Frame resource : FrameNet version 1.3 Corpus Format : SALSA/TIGER XML [Burchardt et al.,2006] Pre-processing: annotation on top of Collins parser syntactic analysis : T and H are randomly reordered to avoid biases Annotation : performed by one highly experienced annotator : inter-annotator agreement over 5% of the corpus – FEE-agreement : 82% – Frame-agreement: 88% – Role-agreement: 91% : annotation carried out using the SALTO tool 1 (1) http://www.coli.uni-saarland.de/projects/salsa/salto/doc 28/05/20086 / 17FATE - Marco Pennacchiotti FATE: a manually frame-annotated Textual Entailment corpus, to study the role of frame semantics in RTE

7 FATE annotation process: an example 28/05/20087 / 17FATE - Marco Pennacchiotti Collins synt. an. full-text annotation (all words considered) [Ruppenhofer,2007]

8 FATE annotation process: an example 28/05/20088 / 17FATE - Marco Pennacchiotti frame FEE Collins synt. an.

9 FATE annotation process: an example 28/05/20089 / 17FATE - Marco Pennacchiotti frame FE Collins synt. an. FEE FE filler Maximization principle: chose the largest constituent possible when annotating

10 Annotation Schema Intuition: annotate as FEE only those words evoking a relevant situation (frame) in the sentence at hand – Very intuitive flavor, but high agreement: 83% on a pilot set of 15 sentences Relevance Principle “Authorities in Brazil hold 200 people as hostage” LEADERSHIPDETAINPEOPLE KIDNAPPING 28/05/200810 / 17FATE - Marco Pennacchiotti V ICTIM P LACE P ERPETRATOR

11 Annotation Schema On T of positive pairs, annotate only the fragments (spans) contributing to the inferential process – Spans are obtained from the ARTE annotation [Garoufi,2007] – For negative pairs it is not straightforward to derive spans, hence we do full annotation Span Annotation T: “Soon after the EZLN had returned to Chiapas, Congress approved a different version of the COCOPA Law, which did not include the autonomy clauses, claiming they were in contradiction with some constitutional rights (private property and secret voting); this was seen as a betrayal by the EZLN and other political groups.” H: “EZLN is a political group.” 28/05/200811 / 17FATE - Marco Pennacchiotti

12 Annotation Schema Unknown frames: use an U NKNOWN frame for words evoking situations not present in the FrameNet database Anaphora Copula and support verbs Modal expressions Metaphors Existential constructions … Other guidelines 28/05/200812 / 17FATE - Marco Pennacchiotti

13 Corpus statistics Annotated pairs : 800 (400 positive, 400 negatives) Annotated frames : 4,500 : avg. 5.6 frames per pair : 1,600 frames in positive pairs : 2,800 in negative pairs Annotated roles : 9,500 :avg. 2.1 roles per frame Annotation time: 230 hours : 90 h for positive pairs (13 min/pair) : 140 h for negative pairs (21 min/pair) 28/05/200813 / 17FATE - Marco Pennacchiotti

14 FrameNet and RTE (simple case) 28/05/200814 / 17FATE - Marco Pennacchiotti Syntactic normalization – Active / Passive EDUCATIONAL_TEACHING( S TUDENT : ground soldiers / soldiers; M ATERIAL : virtual reality/ virtual reality )

15 (1)Resource coverage is too low (2)Models for predicate-argument inference are weak (3)Automatic annotation models (SRL) are not good enough to be safely used in RTE Implementation gap insights 28/05/200815 / 17FATE - Marco Pennacchiotti FrameNet coverage is good: – 373 Unknown frames (8 % of total frames) – Unknown roles 1 % of total roles Coverage is unlikely to be a limiting factor for using FrameNet in applications

16 (1)Resource coverage is too low (2)Models for predicate-argument inference are weak (3)Automatic annotation models (SRL) are not good enough to be safely used in RTE 28/05/200816 / 17FATE - Marco Pennacchiotti To better study predicate-argument inference in RTE To experiment frame-RTE models on a gold-std corpus To learn better SRL models, by training on FATE Corpus is freely available on-line Why should you use FATE ?

17 Thank you! Questions? 28/03/2008FATE – Marco Pennacchiotti17 / 17 FATE download: http://www.coli.uni-saarland.de/projects/salsa/fate pennacchiotti@coli.uni-sb.de www.coli.uni-saarland.de/~pennacchiotti

18 28/05/200818FATE - Marco Pennacchiotti

19 FrameNet and RTE Syntactic normalization – Apposition to copula 28/05/200819FATE - Marco Pennacchiotti PEOPLE_BY_VOCATION( P ERSON : Andreotti / Andreotti ; P LACE : Italy / Italy ; A GE : elder/ elder )

20 FrameNet and RTE 28/05/200820FATE - Marco Pennacchiotti Frame-to-frame inference Sentencing --- HR ---> Imprisonment C ONVICT maps to P RISONER P LACE maps to P LACE

21 Annotation Schema Locality principle – Annotate the local referent of a role filler – Link the local referent to the external referent via the A NAPHORA frame Anaphora 28/05/200821FATE - Marco Pennacchiotti

22 Annotation Schema Verbs carrying minimal semantic content (e.g. be, seem) Annotate the noun as FEE, instead of the verb [Ruppenhofer,2007] Support and Copula Verbs 28/05/200822FATE - Marco Pennacchiotti

23 Annotation Schema Modal expression (e.g. modal verbs, particles, modal triggers) are annotated only when the modal meaning is prevalent in the sentence Modal Expressions 28/05/200823FATE - Marco Pennacchiotti

24 Annotation Schema Metaphors are annotated with their figurative meaning Existential constructions (e.g. “there is”) are annotated with the frame E XISTENCE, only when it is the only meaning conveyed in the sentence (e.g. “There are 11 official languages”) Unknown frames: use an U NKNOWN frame for words evoking situations not present in the FrameNet database Maximization principle: chose the largest constituent possible when annotating Other guidelines 28/05/200824FATE - Marco Pennacchiotti

25 Motivations Semantic knowledge at the predicate-argument level is critical in NLP tasks: “From who did BMW buy Rover ?” “Rover was bought by BMW from British Aerospace” “BMW acquired Rover from British Aerospace” “BMW’s purchase of Rover from British Aerospace” “British Aerospace sold Rover to BMW” Predicate-argument resources (e.g. PropBank and FrameNet) allow to map meaning preserving alternations to the same predicative structure BUY_EVENT ( B UYER : BMW, S ELLER : British Aerospace, G OOD : Rover ) 28/05/200825FATE - Marco Pennacchiotti

26 Motivations Implementation gap: very scarce impact of predicate- argument resource in NLP applications [Fliedner,2007 ; Frank et al.,2006] Possible reasons: (1)Resource coverage is too low (2)Modeling predicate knowledge is too hard (3)Automatic annotation (SRL) is not good enough Our goal: create a gold-standard corpus, manually annotated with predicate-argument structure, to investigate (1)-(3) -Corpus : Second Recognizig Textual Entailment (RTE) Challenge -Annotation : FrameNet Our goal: create a gold-standard corpus, manually annotated with predicate-argument structure, to investigate (1)-(3) -Corpus : Second Recognizig Textual Entailment (RTE) Challenge -Annotation : FrameNet 28/05/200826FATE - Marco Pennacchiotti

27 FATE Corpus annotation: an example 28/05/200827FATE - Marco Pennacchiotti Collins synt. an. full-text annotation (all words considered) [Ruppenhofer,2007]

28 Frames are organized in a hierarchy with various frame-to-frame relations Frame Semantics [Fillmore 1976, 2003] LEGEND FrameNet Berkeley Project 1 – Database of frames for the core lexicon of English – 800 frames, 10.000 lemmas, 135.000 annotated sentences – Hierarchy: 7 frame relations, 1136 edges, 86 roots (1) http://framenet.icsi.berkeley.edu 28/05/200828FATE - Marco Pennacchiotti

29 FATE Corpus annotation: an example 28/05/200829FATE - Marco Pennacchiotti frame FEE Collins synt. an.

30 FATE Corpus annotation: an example 28/05/200830FATE - Marco Pennacchiotti frame FE Collins synt. an. FEE FE filler Maximization principle: chose the largest constituent possible when annotating

31 FATE Corpus annotation: an example 28/05/200831FATE - Marco Pennacchiotti frame FE Collins synt. an. FEE FE filler DEATH( P ROTAGONIST : Hiddleston / person; C AUSE : avalanche )

32 FrameNet and Salsa Project FrameNet Berkeley Project 1 – Database of frames for the core lexicon of English – 800 frames, 10.000 lemmas, 135.000 annotated sentences from BNC SALSA Project 2 – A German corpus with frame annotation (20.000 verbal instances) – Semantic frame-based lexicon for German – Methods for automation and application of frame- semantic information (SRL, RTE, discourse interpretation, etc.) (1) http://framenet.icsi.berkeley.edu/ (2) http://www.coli.uni-saarland.de/projects/salsa/ 28/05/200832FATE - Marco Pennacchiotti

33 Annotation Schema On T of positive pairs, annotate only the fragments (spans) contributing to the inferential process – Spans are obtained from the ARTE annotation [Garoufi,2007] – For negative pairs it is not straightforward to derive spans, hence we do full annotation Span Annotation T: “Soon after the EZLN had returned to Chiapas, Congress approved a different version of the COCOPA Law, which did not include the autonomy clauses, claiming they were in contradiction with some constitutional rights (private property and secret voting); this was seen as a betrayal by the EZLN and other political groups.” H: “EZLN is a political group.” 28/05/200833FATE - Marco Pennacchiotti

34 FrameNet and RTE 28/05/200834FATE - Marco Pennacchiotti Frame-to-frame inference KILLING --- cause ---> DEATH C AUSE maps to C AUSE V ICTIM maps to P ROTAGONIST


Download ppt "FATE: a FrameNet Annotated corpus for Textual Entailment Marco Pennacchiotti, Aljoscha Burchardt Computerlinguistik Saarland University, Germany LREC 2008,"

Similar presentations


Ads by Google