Presentation is loading. Please wait.

Presentation is loading. Please wait.

English Proposition Bank: Status Report

Similar presentations


Presentation on theme: "English Proposition Bank: Status Report"— Presentation transcript:

1 English Proposition Bank: Status Report
Olga Babko-Malaya, Paul Kingsbury, Scott Cotton, Martha Palmer, Mitch Marcus March 25, 2003

2 Outline Overview Status Report
Mapping of Propbank Framesets to other sense distinctions

3 Example He sent merchants around the country a form asking them to check one of three answers. Arg0: He REL: sent Arg2 : merchants around the country Arg1: a form asking them to check one of three answers.

4 Predicate-argument structure
send Agent: He Goal: merchants Theme: form NP NP NP2 He sent merchants around the country a form asking them to check one of three answers.

5 Used At MITRE, Xerox Parc, Sheffield University,
BBN, Syracuse University, IBM, NYU, SRA, CMU, MIT, University of Texas at Dallas, University of Toronto, Columbia University, SPAWAR, and the JHU summer workshop. Also to JK Davis, John Josef Costandi, and Steve Maiorano. Improvements in IE reported in ACL’03 Submission

6 Annotation procedure Extraction of all sentences with given verb
First pass: Automatic tagging (Joseph Rosenzweig) Second pass: Double blind hand annotation Third pass: adjudication Tagging tool highlights inconsistencies Given these guidelines, a number of annotators, mostly undergraduate students majoring in linguistics, extend the templates in the frames to examples from the corpus. The rate of annotation is approximately 50 sentences per annotator-hour.

7 Projected delivery dates
Financial subcorpus alpha release: December, DONE! beta release: July, DONE! adjudicated release: summer 2003 Propbank corpus beta release: Summer 2003 adjudicated release: December 2003

8 English PropBank - Current Status
3183 frame files, corresponding to 3625 distinct predicates (including phrasal variants) - finished! At least single annotated: 2915 verbs, 94.5K instances (80% of the TreeBank) At least double annotated: 2250 verbs, 60K instances (67% of the Treebank) Adjudicated: 1032 verbs, 25K instances (20% of the Treebank) Coordinating with NYU on nominalizations – using Penn tagger and Frames files

9 Word Sense in Propbank Original plan to ignore Word sense not feasible for 700+ verbs Mary left the room Mary left her daughter-in-law her pearls in her will Frameset leave.01 "move away from": Arg0:entity leaving Arg1:place left Frameset leave.02 "give": Arg0:giver Arg1:thing given Arg2:beneficiary How do these relate to traditional word senses as in WordNet?

10 Fine-grained WordNet Senses
Senseval 2 – WSD Bakeoff, usingWordNet 1.7 Verb ‘Develop’ WN1: CREATE, MAKE SOMETHING NEW They developed a new technique WN2: CREATE BY MENTAL ACT They developed a new theory of evolution develop a better way to introduce crystallography techniques

11 WN Senses: verb ‘develop’
WN1 WN WN3 WN4 WN6 WN7 WN WN5 WN 9 WN10 WN11 WN12 WN WN 14 WN WN20

12 Sense Groups: verb ‘develop’
WN1 WN WN3 WN4 WN6 WN7 WN WN5 WN 9 WN10 WN11 WN12 WN WN 14 WN WN20

13 Propbank Framesets for verb ‘develop’
Frameset 1 (sense: create/improve) Arg0: agent Arg1: thing developed Example: They developed a new technique Frameset 2 (sense: come about) Arg1: non-intentional theme Example: The plot develops slowly This verb has 2 Rolesets: ‘come about’ and ‘create’, which are distinguished by whether or not the development process had to be instigated by an outside causal agent, marked as Arg0 in PropBank. The outside agent usages are more likely to be transitive, whereas the internally controlled ones are more likely to be intransitive, but alternations do occur.

14 Mapping between Groups and Framesets
WN1 WN WN3 WN4 WN6 WN7 WN WN5 WN 9 WN10 WN11 WN12 WN WN 14 WN WN20

15 Sense Hierarchy Framesets – coarse grained distinctions
Sense Groups (Senseval-2) intermediate level (includes Levin classes) – 95% overlap WordNet – fine grained distinctions We have been investigating whether or not the sense groups developed for Senseval-2 can provide an intermediate level of hierarchy in between the PropBank Rolesets and the WN 1.7 senses. Our preliminary results show that 95% of the verb instances map directly from sense groups to Rolesets, with each Roleset typically corresponding to two or more sense groups.

16 Sense-Tagging of Propbank
Sense tagging is primarily confined to the financial subcorpus, consists of about 90% of the polysemous instances in that corpus, and spans 415 verbs. single tagged 12k polysemous instances with roleset identifiers. double tagged 3k polysemous instances. 94% agreement between annotators

17 Training Automatic Taggers
Stochastic tagger (Dan Gildea) Results: Gold Standard parses P, 71.7 R Automatic parses P, 55.4 R New results Using argument labels as features for WSD EM clustering for assigning argument labels


Download ppt "English Proposition Bank: Status Report"

Similar presentations


Ads by Google