Presentation is loading. Please wait.

Presentation is loading. Please wait.

CIS630 1 Penn Putting Meaning Into Your Trees Martha Palmer Collaborators: Paul Kingsbury, Olga Babko-Malaya, Bert Xue, Scott Cotton Karin Kipper, Hoa.

Similar presentations


Presentation on theme: "CIS630 1 Penn Putting Meaning Into Your Trees Martha Palmer Collaborators: Paul Kingsbury, Olga Babko-Malaya, Bert Xue, Scott Cotton Karin Kipper, Hoa."— Presentation transcript:

1 CIS630 1 Penn Putting Meaning Into Your Trees Martha Palmer Collaborators: Paul Kingsbury, Olga Babko-Malaya, Bert Xue, Scott Cotton Karin Kipper, Hoa Dang, Szuting Yi, Edward Loper, Jinying Chen, Tom Morton, William Schuler, Fei Xia, Joseph Rosenzweig, Dan Gildea, Christiane Fellbaum September 8, 2003

2 CIS630 2 Penn Elusive nature of “meaning” Natural Language Understanding Natural Language Processing or Natural Language Engineering Empirical techniques rule!

3 CIS630 3 Penn Statistical Machine Translation results  CHINESE TEXT  The japanese court before china photo trade huge & lawsuit.  A large amount of the proceedings before the court dismissed workers.  japan’s court, former chinese servant industrial huge disasters lawsuit.  Japanese Court Rejects Former Chinese Slave Workers’ Lawsuit for Huge Compensation.

4 CIS630 4 Penn Leverage from shallow techniques?  Still need an approximation of meaning for accurate MT, IR, Q&A, IE  Sense tagging  Labeled dependency structures  What do we have as available resources?  What can we do with them?

5 CIS630 5 Penn Outline  Introduction – need for semantics  Sense tagging Issues highlighted by Senseval1  VerbNet  Senseval2 – groupings, impact on ITA  Automatic WSD, impact on scores  Proposition Bank  Framesets, automatic role labellers  Hierarchy of sense distinctions  Mapping VerbNet to PropBank

6 CIS630 6 Penn WordNet - Princeton  On-line lexical reference (dictionary)  Words organized into synonym sets concepts  Hypernyms (ISA), antonyms, meronyms (PART)  Useful for checking selectional restrictions  (doesn’t tell you what they should be)  Typical top nodes - 5 out of 25  (act, action, activity)  (animal, fauna)  (artifact)  (attribute, property)  (body, corpus)

7 CIS630 7 Penn WordNet – president, 6 senses 1.president -- (an executive officer of a firm or corporation) -->CORPORATE EXECUTIVE, BUSINESS EXECUTIVE…  LEADER 2. President of the United States, President, Chief Executive -- (the person who holds the office of head of state of the United States government; "the President likes to jog every morning") -->HEAD OF STATE, CHIEF OF STATE 3. president -- (the chief executive of a republic) -->HEAD OF STATE, CHIEF OF STATE 4. president, chairman, chairwoman, chair, chairperson -- (the officer who presides at the meetings of an organization; "address your remarks to the chairperson") --> PRESIDING OFFICER  LEADER 5. president -- (the head administrative officer of a college or university) --> ACADEMIC ADMINISTRATOR ….  LEADER 6. President of the United States, President, Chief Executive -- (the office of the United States head of state; "a President is elected every four years") --> PRESIDENCY, PRESIDENTSHIP  POSITION

8 CIS630 8 Penn Limitations to WordNet  Poor inter-annotator agreement (73%)  Just sense tags - no representations  Very little mapping to syntax  No predicate argument structure  no selectional restrictions  No generalizations about sense distinctions  No hierarchical entries

9 CIS630 9 Penn SIGLEX98/SENSEVAL  Workshop on Word Sense Disambiguation  54 attendees, 24 systems, 3 languages  34 Words ( Nouns, Verbs, Adjectives )  Both supervised and unsupervised systems  Training data, Test data  Hector senses - very corpus based (mapping to WordNet)  lexical samples - instances, not running text  Inter-annotator agreement over 90% ACL-SIGLEX98,SIGLEX99, CHUM00

10 CIS630 10 Penn Hector - bother, 10 senses  1. intransitive verb, - (make an effort), after negation, usually with to infinitive; (of a person) to take the trouble or effort needed (to do something). Ex. “About 70 percent of the shareholders did not bother to vote at all.”  1.1 (can't be bothered), idiomatic, be unwilling to make the effort needed (to do something), Ex. ``The calculations needed are so tedious that theorists cannot be bothered to do them.''  2. vi; after neg; with `about" or `with"; rarely cont – (of a person) to concern oneself (about something or someone) “He did not bother about the noise of the typewriter because Danny could not hear it above the sound of the tractor.”  2.1 v-passive; with `about" or `with“ - (of a person) to be concerned about or interested in (something) “The only thing I'm bothered about is the well-being of the club.”

11 CIS630 11 Penn Mismatches between lexicons: Hector - WordNet, shake

12 CIS630 12 Penn Levin classes (3100 verbs)  47 top level classes, 193 second and third level  Based on pairs of syntactic frames. John broke the jar. / Jars break easily. / The jar broke. John cut the bread. / Bread cuts easily. / *The bread cut. John hit the wall. / *Walls hit easily. / *The wall hit.  Reflect underlying semantic components contact, directed motion, exertion of force, change of state  Synonyms, syntactic patterns (conative), relations

13 CIS630 13 Penn Confusions in Levin classes?  Not semantically homogenous  {braid, clip, file, powder, pluck, etc...}  Multiple class listings  homonymy or polysemy?  Alternation contradictions?  Carry verbs disallow the Conative, but include  {push,pull,shove,kick,draw,yank,tug}  also in Push/pull class, does take the Conative

14 CIS630 14 Penn Intersective Levin classes

15 CIS630 15 Penn Regular Sense Extensions John pushed the chair. +force, +contact John pushed the chairs apart. +ch-state John pushed the chairs across the room. +ch- loc John pushed at the chair. -ch-loc The train whistled into the station. +ch-loc The truck roared past the weigh station. +ch-loc AMTA98,ACL98,TAG98

16 CIS630 16 Penn Intersective Levin Classes  More syntactically and semantically coherent  sets of syntactic patterns  explicit semantic components  relations between senses VERBNET www.cis.upenn.edu/verbnet

17 CIS630 17 Penn VerbNet  Computational verb lexicon  Clear association between syntax and semantics  Syntactic frames (LTAGs) and selectional restrictions (WordNet)  Lexical semantic information – predicate argument structure  Semantic components represented as predicates  Links to WordNet senses  Entries based on refinement of Levin Classes  Inherent temporal properties represented explicitly  during(E), end(E), result(E) TAG00, AAAI00, Coling00

18 CIS630 18 Penn VerbNet Class entries:  Verb classes allow us to capture generalizations about verb behavior  Verb classes are hierarchically organized  Members have common semantic elements, thematic roles, syntactic frames and coherent aspect Verb entries:  Each verb can refer to more than one class (for different senses)  Each verb sense has a link to the appropriate synsets in WordNet (but not all senses of WordNet may be covered)  A verb may add more semantic information to the basic semantics of its class

19 Basic TransitiveA V P cause(Agent,E) /\ manner (during(E),directedmotion,Agent)/\ manner (end(E), forceful,Agent)/\ contact(end(E),Agent,Patient) ConativeAV at Pmanner (during (E), directedmotion, Agent) ¬contact(end(E),Agent,Patient) With/against alternationA V I against/on P cause(Agent, E) /\ manner(during (E),directedmotion, Instr)/\ manner(end(E), forceful, Instr)/\ contact (end(E), Instr, Patient) MEMBERS:[bang(1,3),bash(1),... hit(2,4,7,10), kick (3),...] THEMATIC ROLES:Agent, Patient, Instrument SELECT RESTRICTIONS:Agent(int_control), Patient(concrete), Instrument(concrete) FRAMES and PREDICATES: Hit class – hit-18.1

20 CIS630 20 Penn VERBNET

21 CIS630 21 Penn VerbNet/WordNet

22 CIS630 22 Penn Mapping WN-Hector via VerbNet SIGLEX99, LREC00

23 CIS630 23 Penn SENSEVAL2 –ACL’01 Adam Kilgarriff, Phil Edmond and Martha Palmer All-words taskLexical sample task CzechBasque DutchChineseEnglish EstonianItalian Japanese Korean Spanish Swedish

24 CIS630 24 Penn English Lexical Sample - Verbs  Preparation for Senseval 2  manual tagging of 29 highly polysemous verbs (call, draw, drift, carry, find, keep, turn,...)  WordNet (pre-release version 1.7)  To handle unclear sense distinctions  detect and eliminate redundant senses  detect and cluster closely related senses NOT ALLOWED

25 CIS630 25 Penn WordNet – call, 28 senses 1.name, call -- (assign a specified, proper name to; "They named their son David"; "The new school was named after the famous Civil Rights leader") -> LABEL 2. call, telephone, call up, phone, ring -- (get or try to get into communication (with someone) by telephone; "I tried to call you all night"; "Take two aspirin and call me in the morning") ->TELECOMMUNICATE 3. call -- (ascribe a quality to or give a name of a common noun that reflects a quality; "He called me a bastard"; "She called her children lazy and ungrateful") -> LABEL

26 CIS630 26 Penn WordNet – call, 28 senses 4. call, send for -- (order, request, or command to come; "She was called into the director's office"; "Call the police!") -> ORDER 5. shout, shout out, cry, call, yell, scream, holler, hollo, squall -- (utter a sudden loud cry; "she cried with pain when the doctor inserted the needle"; "I yelled to her from the window but she couldn't hear me") -> UTTER 6. visit, call in, call -- (pay a brief visit; "The mayor likes to call on some of the prominent citizens") -> MEET

27 CIS630 27 Penn Groupings Methodology  Double blind groupings, adjudication  Syntactic Criteria (VerbNet was useful)  Distinct subcategorization frames  call him a bastard  call him a taxi  Recognizable alternations – regular sense extensions:  play an instrument  play a song  play a melody on an instrument

28 CIS630 28 Penn Groupings Methodology (cont.)  Semantic Criteria  Differences in semantic classes of arguments  Abstract/concrete, human/animal, animate/inanimate, different instrument types,…  Differences in the number and type of arguments  Often reflected in subcategorization frames  John left the room.  I left my pearls to my daughter-in-law in my will.  Differences in entailments  Change of prior entity or creation of a new entity?  Differences in types of events  Abstract/concrete/mental/emotional/….  Specialized subject domains

29 CIS630 29 Penn WordNet: - call, 28 senses WN2, WN13,WN28 WN15 WN26 WN3 WN19 WN4 WN 7 WN8 WN9 WN1 WN22 WN20 WN25 WN18 WN27 WN5 WN 16WN6 WN23 WN12 WN17, WN 11 WN10, WN14, WN21, WN24

30 CIS630 30 Penn WordNet: - call, 28 senses, groups WN2, WN13,WN28 WN15 WN26 WN3 WN19 WN4 WN 7 WN8 WN9 WN1 WN22 WN20 WN25 WN18 WN27 WN5 WN 16WN6 WN23 WN12 WN17, WN 11 WN10, WN14, WN21, WN24, Phone/radio Label Loud cry Bird or animal cry Request Call a loan/bond Visit Challenge Bid

31 CIS630 31 Penn WordNet – call, 28 senses, Group1 1.name, call -- (assign a specified, proper name to; "They named their son David"; "The new school was named after the famous Civil Rights leader") --> LABEL 3. call -- (ascribe a quality to or give a name of a common noun that reflects a quality; "He called me a bastard"; "She called her children lazy and ungrateful") --> LABEL 19. call -- (consider or regard as being; "I would not call her beautiful")--> SEE 22. address, call -- (greet, as with a prescribed form, title, or name; "He always addresses me with `Sir'"; "Call me Mister"; "She calls him by first name") --> ADDRESS

32 CIS630 32 Penn Sense Groups: verb ‘develop’ WN1 WN2 WN3 WN4 WN6 WN7 WN8 WN5 WN 9 WN10 WN11 WN12 WN13 WN 14 WN19 WN20

33 CIS630 33 Penn Results – averaged over 28 verbs CallDevelopTotal WN/corpus 28/14 21/1616.28/10.83 Grp/corp 11/79/68.07/5.90 Entropy 3.683.172.81 ITA-fine 69%67%71% ITA-coarse 89%85%82%

34 CIS630 34 Penn Maximum Entropy WSD Hoa Dang (in progress)  Maximum entropy framework  combines different features with no assumption of independence  estimates conditional probability that W has sense X in context Y, (where Y is a conjunction of linguistic features  feature weights are determined from training data  weights produce a maximum entropy probability distribution

35 CIS630 35 Penn Features used  Topical contextual linguistic feature for W:  presence of automatically determined keywords in S  Local contextual linguistic features for W:  presence of subject, complements  words in subject, complement positions, particles, preps  noun synonyms and hypernyms for subjects, complements  named entity tag (PERSON, LOCATION,..) for proper Ns  words within +/- 2 word window

36 CIS630 36 Penn Grouping improved sense identification for MxWSD  75% with training and testing on grouped senses vs. 43% with training and testing on fine-grained senses  Most commonly confused senses suggest grouping:  (1) name, call--assign a specified proper name to; ``They called their son David''  (2) call--ascribe a quality to or give a name that reflects a quality; ``He called me a bastard'';  (3) call--consider or regard as being; ``I would not call her beautiful''  (4) address, call--greet, as with a prescribed form, title, or name; ``Call me Mister''; ``She calls him by his first name''

37 CIS630 37 Penn Results – averaged over 28 verbs Total WN/corpus 16.28/10.83 Grp/corp 8.07/5.90 Entropy 2.81 ITA-fine 71% ITA-coarse 82% MX-fine 59% MX-coarse 69%

38 CIS630 38 Penn Results - first 5 Senseval2 verbs VerbBeginCallCarryDevelopDrawDress WN/corpus 10/9 28/14 39/2221/1635/2115/8 Grp/corp 10/911/716/119/615/97/4 Entropy 1.763.683.973.174.602.89 ITA-fine.812.693.607.678.767.865 ITA-coarse.814.892.753.852.8251.00 MX-fine.832.470.379.493.366.610 MX-coarse.832.636.485.681.512.898

39 CIS630 39 Penn Summary of WSD  Choice of features is more important than choice of machine learning algorithm  Importance of syntactic structure (English WSD but not Chinese)  Importance of dependencies  Importance of an hierarchical approach to sense distinctions, and quick adaptation to new usages.

40 CIS630 40 Penn Outline  Introduction – need for semantics  Sense tagging Issues highlighted by Senseval1  VerbNet  Senseval2 – groupings, impact on ITA  Automatic WSD, impact on scores  Proposition Bank  Framesets, automatic role labellers  Hierarchy of sense distinctions  Mapping VerbNet to PropBank

41 CIS630 41 Penn Proposition Bank: From Sentences to Propositions Powell met Zhu Rongji Proposition: meet(Powell, Zhu Rongji ) Powell met with Zhu Rongji Powell and Zhu Rongji met Powell and Zhu Rongji had a meeting... When Powell met Zhu Rongji on Thursday they discussed the return of the spy plane. meet(Powell, Zhu) discuss([Powell, Zhu], return(X, plane)) debate consult join wrestle battle meet(Somebody1, Somebody2)

42 CIS630 42 Penn Capturing semantic roles*  Charles broke [ ARG1 the LCD Projector.]  [ARG1 The windows] were broken by the hurricane.  [ARG1 The vase] broke into pieces when it toppled over. SUBJ *See also Framenet, http://www.icsi.berkeley.edu/~framenet/http://www.icsi.berkeley.edu/~framenet/

43 CIS630 43 Penn A TreeBanked Sentence Analysts S NP-SBJ VP have VP beenVP expecting NP a GM-Jaguar pact NP that SBAR WHNP-1 *T*-1 S NP-SBJ VP would VP give the US car maker NP an eventual 30% stake NP the British company NP PP-LOC in (S (NP-SBJ Analysts) (VP have (VP been (VP expecting (NP (NP a GM-Jaguar pact) (SBAR (WHNP-1 that) (S (NP-SBJ *T*-1) (VP would (VP give (NP the U.S. car maker) (NP (NP an eventual (ADJP 30 %) stake) (PP-LOC in (NP the British company)))))))))))) Analysts have been expecting a GM-Jaguar pact that would give the U.S. car maker an eventual 30% stake in the British company.

44 CIS630 44 Penn The same sentence, PropBanked Analysts have been expecting a GM-Jaguar pact Arg0 Arg1 (S Arg0 (NP-SBJ Analysts) (VP have (VP been (VP expecting Arg1 (NP (NP a GM-Jaguar pact) (SBAR (WHNP-1 that) (S Arg0 (NP-SBJ *T*-1) (VP would (VP give Arg2 (NP the U.S. car maker) Arg1 (NP (NP an eventual (ADJP 30 %) stake) (PP-LOC in (NP the British company)))))))))))) that would give *T*-1 the US car maker an eventual 30% stake in the British company Arg0 Arg2 Arg1 expect(Analysts, GM-J pact) give(GM-J pact, US car maker, 30% stake)

45 CIS630 45 Penn English PropBank http://www.cis.upenn.edu/~ace/  1M words of Treebank over 2 years, May’01-03  New semantic augmentations  Predicate-argument relations for verbs  label arguments: Arg0, Arg1, Arg2, …  First subtask, 300K word financial subcorpus (12K sentences, 29K predicates,1700 lemmas)  Spin-off: Guidelines  FRAMES FILES - (necessary for annotators)  3500+ verbs with labeled examples, rich semantics, 118K predicates

46 CIS630 46 Penn Frames Example: expect Roles: Arg0: expecter Arg1: thing expected Example: Transitive, active: Portfolio managers expect further declines in interest rates. Arg0: Portfolio managers REL: expect Arg1: further declines in interest rates

47 CIS630 47 Penn Frames File example: give Roles: Arg0: giver Arg1: thing given Arg2: entity given to Example: double object The executives gave the chefs a standing ovation. Arg0: The executives REL: gave Arg2: the chefs Arg1: a standing ovation

48 CIS630 48 Penn How are arguments numbered?  Examination of example sentences  Determination of required / highly preferred elements  Sequential numbering, Arg0 is typical first argument, except  ergative/unaccusative verbs (shake example)  Arguments mapped for "synonymous" verbs

49 CIS630 49 Penn Trends in Argument Numbering  Arg0 = agent  Arg1 = direct object / theme / patient  Arg2 = indirect object / benefactive / instrument / attribute / end state  Arg3 = start point / benefactive / instrument / attribute  Arg4 = end point

50 CIS630 50 Penn Additional tags (arguments or adjuncts?)  Variety of ArgM’s (Arg#>4):  TMP - when?  LOC - where at?  DIR - where to?  MNR - how?  PRP -why?  REC - himself, themselves, each other  PRD -this argument refers to or modifies another  ADV -others

51 CIS630 51 Penn Inflection  Verbs also marked for tense/aspect  Passive/Active  Perfect/Progressive  Third singular (is has does was)  Present/Past/Future  Infinitives/Participles/Gerunds/Finites  Modals and negation marked as ArgMs

52 CIS630 52 Penn Phrasal Verbs  Put together  Put in  Put off  Put on  Put out  Put up ...

53 CIS630 53 Penn Ergative/Unaccusative Verbs: rise Roles Arg1 = Logical subject, patient, thing rising Arg2 = EXT, amount risen Arg3* = start point Arg4 = end point Sales rose 4% to $3.28 billion from $3.16 billion. *Note: Have to mention prep explicitly, Arg3-from, Arg4-to, or could have used ArgM-Source, ArgM-Goal. Arbitrary distinction.

54 CIS630 54 Penn Synonymous Verbs: add in sense rise Roles: Arg1 = Logical subject, patient, thing rising/gaining/being added to Arg2 = EXT, amount risen Arg4 = end point The Nasdaq composite index added 1.01 to 456.6 on paltry volume.

55 CIS630 55 Penn Annotation procedure  Extraction of all sentences with given verb  First pass: Automatic tagging (Joseph Rosenzweig)  http://www.cis.upenn.edu/~josephr/TIDES/index.html#lexicon  Second pass: Double blind hand correction  Variety of backgrounds  Less syntactic training than for treebanking  Tagging tool highlights discrepancies  Third pass: Solomonization (adjudication)

56 CIS630 56 Penn Inter-Annotator Agreement

57 CIS630 57 Penn Solomonization Also, substantially lower Dutch corporate tax rates helped the company keep its tax outlay flat relative to earnings growth. *** Kate said: arg0 : the company arg1 : its tax outlay arg3-PRD : flat argM-MNR : relative to earnings growth *** Katherine said: arg0 : the company arg1 : its tax outlay arg3-PRD : flat argM-ADV : relative to earnings growth

58 CIS630 58 Penn Automatic Labelling of Semantic Relations Features:  Predicate  Phrase Type  Parse Tree Path  Position (Before/after predicate)  Voice (active/passive)  Head Word

59 CIS630 59 Penn Labelling Accuracy-Known Boundaries 79.673.682.0Automatic 83.177.0Gold Standard PropBank > 10 instances PropBankFramenetParses Accuracy of semantic role prediction for known boundaries--the system is given the constituents to classify. Framenet examples (training/test) are handpicked to be unambiguous.

60 CIS630 60 Penn Labelling Accuracy – Unknown Boundaries 57.7 50.064.6 61.2Automatic 71.1 64.4Gold Standard PropBank Precision Recall Framenet Precision Recall Parses Accuracy of semantic role prediction for unknown boundaries--the system must identify the constituents as arguments and give them the correct roles.

61 CIS630 61 Penn Additional Automatic Role Labelers  Szuting Yi –  EM clustering, unsupervised  Conditional Random Fields  Yinying Chen - using role labels as features for WSD,  decision trees, supervised,  EM clustering, unsupervised

62 CIS630 62 Penn Outline  Introduction – need for semantics  Sense tagging Issues highlighted by Senseval1  VerbNet  Senseval2 – groupings, impact on ITA  Automatic WSD, impact on scores  Proposition Bank  Framesets, automatic role labellers  Hierarchy of sense distinctions  Mapping VerbNet to PropBank

63 CIS630 63 Penn Frames: Multiple Framesets  Framesets are not necessarily consistent between different senses of the same verb  Verb with multiple senses can have multiple frames, but not necessarily  Roles and mappings onto argument labels are consistent between different verbs that share similar argument structures, Similar to Framenet  Levin / VerbNet classes  http://www.cis.upenn.edu/~dgildea/VerbNet http://www.cis.upenn.edu/~dgildea/VerbNet  Out of the 720 most frequent verbs:  1 frameset - 470  2 framesets - 155  3+ framesets - 95 (includes light verbs)

64 CIS630 64 Penn Word Senses in PropBank  Orders to ignore word sense not feasible for 700+ verbs  Mary left the room  Mary left her daughter-in-law her pearls in her will Frameset leave.01 "move away from": Arg0: entity leaving Arg1: place left Frameset leave.02 "give": Arg0: giver Arg1: thing given Arg2: beneficiary How do these relate to traditional word senses as in WordNet?

65 CIS630 65 Penn WordNet: - leave, 14 senses WN1 WN5 WN3 WN7 WN8 WN2 WN12 WN9 WN10 WN13 WN14 WN4 WN6 WN11

66 CIS630 66 Penn WordNet: - leave, groups WN1 WN5 WN3 WN7 WN8 WN2 WN12 WN9 WN10 WN13 WN14 WN4 WN6 WN11

67 CIS630 67 Penn WordNet: - leave, framesets WN1 WN5 WN3 WN7 WN8 WN2 WN12 WN9 WN10 WN13 WN14 WN4 WN6 WN11

68 CIS630 68 Penn Overlap between Groups and Framesets – 95% WN1 WN2 WN3 WN4 WN6 WN7 WN8 WN5 WN 9 WN10 WN11 WN12 WN13 WN 14 WN19 WN20 Frameset1 Frameset2 develop

69 CIS630 69 Penn Sense Hierarchy  Framesets – coarse grained distinctions  Sense Groups (Senseval-2) intermediate level (includes Levin classes) – 95% overlap  WordNet – fine grained distinctions

70 CIS630 70 Penn leave.01 - move away from  VerbNet  Levin class: escape-51.1-1; WordNet Senses: WN 1, 5, 8  Thematic Roles: Location[+concrete] Theme[+concrete]  Frames with Semantics Basic Intransitive "The convict escaped" motion(during(E),Theme) direction(during(E),Prep,Theme,?Location) Intransitive (+ path PP) "The convict escaped from the prison" Locative Preposition Drop "The convict escaped the prison"

71 CIS630 71 Penn leave.02 - give  VerbNet  Levin class: future_having-13.3 ; WordNet Senses: WN 2,10,13  Thematic Roles: Agent[+animate OR +organization] Recipient[+animate OR +organization] Theme[]  Frames with Semantics Dative "I promised somebody my time" Agent V Recipient Theme has_possession(start(E),Agent,Theme) future_possession(end(E),Recipient,Theme) cause(Agent,E) Transitive (+ Recipient PP) "We offered our paycheck to her" Agent V Theme Prep(to) Recipient ) Transitive (Theme Object) "I promised my house (to somebody)" Agent V Theme

72 CIS630 72 Penn Propbank to VN mapping from Text meaning workshop  Cluster verbs based on frames of arg labels  K-nearest neighbors  EM  Compare derived clusters to VerbNet classes  sim(X, Y) =  Only a rough measure  Not all verbs in VerbNet are attested in PropBank  Not all verbs in PropBank are treated in VerbNet

73 PropBank Frame for Clustering For [ Arg4 Mr. Sherwin], [ Arg0 a conviction] could [ Rel carry] [ Arg1 penalties of five years in prison and a $250,000 fine on each count] (wsj_1331) reduces to: arg4 arg0 rel arg1 Frameset tags, ~7K annotations, 200 schemas, 921 verbs

74 1, tran, 2 – ditran, 3 unaccusative

75

76 Adding to VerbNet Classes  36.3 'combative meetings'  fight, consult,...  Clustering analysis adds hedge  Hedge one's bets against...  But some investors might prefer a simpler strategy than hedging their individual holdings (wsj\_1962)  Thus, buying puts after a big market slide can be an expensive way to hedge against risk (wsj\_2415)

77 CIS630 77 Penn Lexical Semantics at Penn  Annotation of Penn Treebank with semantic role labels (propositions) and sense tags  Links to VerbNet and WordNet  Provides additional semantic information that clearly distinguishes verb senses  Class based to facilitate extension to previously unseen usages

78 CIS630 78 Penn PropBank I Also, [ Arg0 substantially lower Dutch corporate tax rates] helped [ Arg1 [ Arg0 the company] keep [ Arg1 its tax outlay] [ Arg3-PRD flat] [ ArgM-ADV relative to earnings growth]]. relative to earnings… flatits tax outlay the company keep the company keep its tax outlay flat tax rateshelp ArgM-ADVArg3- PRD Arg1Arg0REL Event variables; ID# h23 k16 nominal reference; sense tags; help2,5 tax rate1 keep1 company1 discourse connectives { } I


Download ppt "CIS630 1 Penn Putting Meaning Into Your Trees Martha Palmer Collaborators: Paul Kingsbury, Olga Babko-Malaya, Bert Xue, Scott Cotton Karin Kipper, Hoa."

Similar presentations


Ads by Google