Presentation is loading. Please wait.

Presentation is loading. Please wait.

Columbia, 1/29/04 1 Penn Putting Meaning Into Your Trees Martha Palmer University of Pennsylvania Columbia University New York City January 29, 2004.

Similar presentations


Presentation on theme: "Columbia, 1/29/04 1 Penn Putting Meaning Into Your Trees Martha Palmer University of Pennsylvania Columbia University New York City January 29, 2004."— Presentation transcript:

1 Columbia, 1/29/04 1 Penn Putting Meaning Into Your Trees Martha Palmer University of Pennsylvania Columbia University New York City January 29, 2004

2 Columbia, 1/29/04 2 Penn Outline  Introduction  Background: WordNet, Levin classes, VerbNet  Proposition Bank – capturing shallow semantics  Mapping PropBank to VerbNet  Mapping PropBank to WordNet

3 Columbia, 1/29/04 3 Penn Ask Jeeves – A Q/A, IR ex. What do you call a successful movie?  Tips on Being a Successful Movie Vampire... I shall call the police.  Successful Casting Call & Shoot for ``Clash of Empires''... thank everyone for their participation in the making of yesterday's movie.  Demme's casting is also highly entertaining, although I wouldn't go so far as to call it successful. This movie's resemblance to its predecessor is pretty vague...  VHS Movies: Successful Cold Call Selling: Over 100 New Ideas, Scripts, and Examples from the Nation's Foremost Sales Trainer. Blockbuster

4 Columbia, 1/29/04 4 Penn Ask Jeeves – filtering w/ POS tag What do you call a successful movie?  Tips on Being a Successful Movie Vampire... I shall call the police.  Successful Casting Call & Shoot for ``Clash of Empires''... thank everyone for their participation in the making of yesterday's movie.  Demme's casting is also highly entertaining, although I wouldn't go so far as to call it successful. This movie's resemblance to its predecessor is pretty vague...  VHS Movies: Successful Cold Call Selling: Over 100 New Ideas, Scripts, and Examples from the Nation's Foremost Sales Trainer.

5 Columbia, 1/29/04 5 Penn Filtering out “call the police” call(you,movie,what) ≠ call(you,police) Syntax

6 Columbia, 1/29/04 6 Penn English lexical resource is required  That provides sets of possible syntactic frames for verbs.  And provides clear, replicable sense distinctions. AskJeeves: Who do you call for a good electronic lexical database for English?

7 Columbia, 1/29/04 7 Penn WordNet – call, 28 senses 1.name, call -- (assign a specified, proper name to; "They named their son David"; …) -> LABEL 2. call, telephone, call up, phone, ring -- (get or try to get into communication (with someone) by telephone; "I tried to call you all night"; …) ->TELECOMMUNICATE 3. call -- (ascribe a quality to or give a name of a common noun that reflects a quality; "He called me a bastard"; …) -> LABEL 4. call, send for -- (order, request, or command to come; "She was called into the director's office"; "Call the police!") -> ORDER

8 Columbia, 1/29/04 8 Penn WordNet – Princeton (Miller 1985, Fellbaum 1998)  On-line lexical reference (dictionary)  Nouns, verbs, adjectives, and adverbs grouped into synonym sets  Other relations include hypernyms (ISA), antonyms, meronyms  Limitations as a computational lexicon  Contains little syntactic information  No explicit predicate argument structures  No systematic extension of basic senses  Sense distinctions are very fine-grained, ITA 73%  No hierarchical entries

9 Columbia, 1/29/04 9 Penn Levin classes (Levin, 1993)  3100 verbs, 47 top level classes, 193 second and third level  Each class has a syntactic signature based on alternations. John broke the jar. / The jar broke. / Jars break easily. John cut the bread. / *The bread cut. / Bread cuts easily. John hit the wall. / *The wall hit. / *Walls hit easily.

10 Columbia, 1/29/04 10 Penn Levin classes (Levin, 1993)  Verb class hierarchy: 3100 verbs, 47 top level classes, 193  Each class has a syntactic signature based on alternations. John broke the jar. / The jar broke. / Jars break easily. change-of-state John cut the bread. / *The bread cut. / Bread cuts easily. change-of-state, recognizable action, sharp instrument John hit the wall. / *The wall hit. / *Walls hit easily. contact, exertion of force

11 Columbia, 1/29/04 11 Penn

12 Columbia, 1/29/04 12 Penn Confusions in Levin classes?  Not semantically homogenous  {braid, clip, file, powder, pluck, etc...}  Multiple class listings  homonymy or polysemy?  Conflicting alternations?  Carry verbs disallow the Conative, (*she carried at the ball), but include {push,pull,shove,kick,draw,yank,tug}  also in Push/pull class, does take the Conative (she kicked at the ball)

13 Columbia, 1/29/04 13 Penn Intersective Levin Classes “at” ¬CH-LOC “across the room” CH-LOC “apart” CH-STATE Dang, Kipper & Palmer, ACL98

14 Columbia, 1/29/04 14 Penn Intersective Levin Classes  More syntactically and semantically coherent  sets of syntactic patterns  explicit semantic components  relations between senses VERBNET Dang, Kipper & Palmer, IJCAI00, Coling00

15 Columbia, 1/29/04 15 Penn VerbNet – Karin Kipper  Class entries:  Capture generalizations about verb behavior  Organized hierarchically  Members have common semantic elements, semantic roles and syntactic frames  Verb entries:  Refer to a set of classes (different senses)  each class member linked to WN synset(s) (not all WN senses are covered)

16 Columbia, 1/29/04 16 Penn Semantic role labels: Julia broke the LCD projector. break (agent(Julia), patient(LCD-projector)) cause(agent(Julia), broken(LCD-projector)) agent(A) -> intentional(A), sentient(A), causer(A), affector(A) patient(P) -> affected(P), change(P),…

17 Columbia, 1/29/04 17 Penn Hand built resources vs. Real data  VerbNet is based on linguistic theory – how useful is it?  How well does it correspond to syntactic variations found in naturally occurring text? PropBank

18 Columbia, 1/29/04 18 Penn Proposition Bank: From Sentences to Propositions Powell met Zhu Rongji Proposition: meet(Powell, Zhu Rongji ) Powell met with Zhu Rongji Powell and Zhu Rongji met Powell and Zhu Rongji had a meeting... When Powell met Zhu Rongji on Thursday they discussed the return of the spy plane. meet(Powell, Zhu) discuss([Powell, Zhu], return(X, plane)) debate consult join wrestle battle meet(Somebody1, Somebody2)

19 Columbia, 1/29/04 19 Penn Capturing semantic roles*  Owen broke [ ARG1 the laser pointer.]  [ARG1 The windows] were broken by the hurricane.  [ARG1 The vase] broke into pieces when it toppled over. SUBJ *See also Framenet,

20 Columbia, 1/29/04 20 Penn English lexical resource is required  That provides sets of possible syntactic frames for verbs with semantic role labels.  And provides clear, replicable sense distinctions.

21 Columbia, 1/29/04 21 Penn A TreeBanked Sentence Analyst s S NP- SBJ VP have VP beenVP expecting NP a GM-Jaguar pact NP that SBAR WHNP-1 *T*-1 S NP-SBJ VP would VP give the US car maker NP an eventual 30% stake NP the British company NP PP- LOC in (S (NP-SBJ Analysts) (VP have (VP been (VP expecting (NP (NP a GM-Jaguar pact) (SBAR (WHNP-1 that) (S (NP-SBJ *T*-1) (VP would (VP give (NP the U.S. car maker) (NP (NP an eventual (ADJP 30 %) stake) (PP-LOC in (NP the British company)))))))))))) Analysts have been expecting a GM-Jaguar pact that would give the U.S. car maker an eventual 30% stake in the British company.

22 Columbia, 1/29/04 22 Penn The same sentence, PropBanked Analyst s have been expecting a GM-Jaguar pact Arg0 Arg1 (S Arg0 (NP-SBJ Analysts) (VP have (VP been (VP expecting Arg1 (NP (NP a GM-Jaguar pact) (SBAR (WHNP-1 that) (S Arg0 (NP-SBJ *T*-1) (VP would (VP give Arg2 (NP the U.S. car maker) Arg1 (NP (NP an eventual (ADJP 30 %) stake) (PP-LOC in (NP the British company)))))))))))) that would give *T*-1 the US car maker an eventual 30% stake in the British company Arg0 Arg2 Arg1 expect(Analysts, GM-J pact) give(GM-J pact, US car maker, 30% stake)

23 Columbia, 1/29/04 23 Penn Frames File Example: expect Roles: Arg0: expecter Arg1: thing expected Example: Transitive, active: Portfolio managers expect further declines in interest rates. Arg0: Portfolio managers REL: expect Arg1: further declines in interest rates

24 Columbia, 1/29/04 24 Penn Frames File example: give Roles: Arg0: giver Arg1: thing given Arg2: entity given to Example: double object The executives gave the chefs a standing ovation. Arg0: The executives REL: gave Arg2: the chefs Arg1: a standing ovation

25 Columbia, 1/29/04 25 Penn Word Senses in PropBank  Orders to ignore word sense not feasible for 700+ verbs  Mary left the room  Mary left her daughter-in-law her pearls in her will Frameset leave.01 "move away from": Arg0: entity leaving Arg1: place left Frameset leave.02 "give": Arg0: giver Arg1: thing given Arg2: beneficiary How do these relate to traditional word senses in VerbNet and WordNet?

26 Columbia, 1/29/04 26 Penn Annotation procedure  PTB II - Extraction of all sentences with given verb  Create Frame File for that verb Paul Kingsbury  (3100+ lemmas, 4400 framesets,118K predicates)  Over 300 created automatically via VerbNet  First pass: Automatic tagging (Joseph Rosenzweig)   Second pass: Double blind hand correction Paul Kingsbury  Tagging tool highlights discrepancies Scott Cotton  Third pass: Solomonization (adjudication)  Betsy Klipple, Olga Babko-Malaya

27 Columbia, 1/29/04 27 Penn Trends in Argument Numbering  Arg0 = agent  Arg1 = direct object / theme / patient  Arg2 = indirect object / benefactive / instrument / attribute / end state  Arg3 = start point / benefactive / instrument / attribute  Arg4 = end point  Per word vs frame level – more general?

28 Columbia, 1/29/04 28 Penn Additional tags (arguments or adjuncts?)  Variety of ArgM’s (Arg#>4):  TMP - when?  LOC - where at?  DIR - where to?  MNR - how?  PRP -why?  REC - himself, themselves, each other  PRD -this argument refers to or modifies another  ADV –others

29 Columbia, 1/29/04 29 Penn Inflection  Verbs also marked for tense/aspect  Passive/Active  Perfect/Progressive  Third singular (is has does was)  Present/Past/Future  Infinitives/Participles/Gerunds/Finites  Modals and negations marked as ArgMs

30 Columbia, 1/29/04 30 Penn Frames: Multiple Framesets  Out of the 787 most frequent verbs:  1 Frameset – 521  2 Frameset – 169  3+ Frameset - 97 (includes light verbs)  94% ITA  Framesets are not necessarily consistent between different senses of the same verb  Framesets are consistent between different verbs that share similar argument structures, (like FrameNet)

31 Columbia, 1/29/04 31 Penn Ergative/Unaccusative Verbs Roles (no ARG0 for unaccusative verbs) Arg1 = Logical subject, patient, thing rising Arg2 = EXT, amount risen Arg3* = start point Arg4 = end point Sales rose 4% to $3.28 billion from $3.16 billion. The Nasdaq composite index added 1.01 to on paltry volume.

32 Columbia, 1/29/04 32 Penn Actual data for leave  Leave.01 “move away from” Arg0 rel Arg1 Arg3 Leave.02 “give” Arg0 rel Arg1 Arg2 sub-ARG0 obj-ARG1sub-ARG0 obj-ARG1 44 sub-ARG0sub-ARG0 20 sub-ARG0 NP-ARG1-with obj-ARG2sub-ARG0 NP-ARG1-with obj-ARG2 17 sub-ARG0 sub-ARG2 ADJP-ARG3-PRDsub-ARG0 sub-ARG2 ADJP-ARG3-PRD 10 sub-ARG0 sub-ARG1 ADJP-ARG3-PRDsub-ARG0 sub-ARG1 ADJP-ARG3-PRD 6 sub-ARG0 sub-ARG1 VP-ARG3-PRDsub-ARG0 sub-ARG1 VP-ARG3-PRD 5 NP-ARG1-with obj-ARG2NP-ARG1-with obj-ARG2 4 obj-ARG1obj-ARG1 3 sub-ARG0 sub-ARG2 VP-ARG3-PRDsub-ARG0 sub-ARG2 VP-ARG3-PRD 3

33 Columbia, 1/29/04 33 Penn PropBank/FrameNet Buy Arg0: buyer Arg1: goods Arg2: seller Arg3: rate Arg4: payment Sell Arg0: seller Arg1: goods Arg2: buyer Arg3: rate Arg4: payment Broader, more neutral, more syntactic – maps readily to VN,TR.FN Rambow, et al, PMLB03

34 Columbia, 1/29/04 34 Penn Annotator accuracy – ITA 84%

35 Columbia, 1/29/04 35 Penn English lexical resource is required  That provides sets of possible syntactic frames for verbs with semantic role labels ?  And provides clear, replicable sense distinctions.

36 Columbia, 1/29/04 36 Penn English lexical resource is required  That provides sets of possible syntactic frames for verbs with semantic role labels that can be automatically assigned accurately to new text?  And provides clear, replicable sense distinctions.

37 Columbia, 1/29/04 37 Penn Automatic Labelling of Semantic Relations Stochastic Model Features:  Predicate  Phrase Type  Parse Tree Path  Position (Before/after predicate)  Voice (active/passive)  Head Word Gildea & Jurafsky, CL02, Gildea & Palmer, ACL02

38 Columbia, 1/29/04 38 Penn Semantic Role Labelling Accuracy- Known Boundaries Automatic parses Gold St. parses PropBank ≥ 10 instances PropBank Framenet ≥ 10 inst Accuracy of semantic role prediction for known boundaries--the system is given the constituents to classify. FrameNet examples (training/test) are handpicked to be unambiguous. Lower performance with unknown boundaries. Higher performance with traces. Almost evens out.

39 Columbia, 1/29/04 39 Penn Additional Automatic Role Labelers  Performance improved from 77% to 88% Colorado  (Gold Standard parses, < 10 instances)  Same features plus  Named Entity tags  Head word POS  For unseen verbs – backoff to automatic verb clusters  SVM’s  Role or not role  For each likely role, for each Arg#, Arg# or not  No overlapping role labels allowed Pradhan, et. al., ICDM03, Sardeneau, et. al, ACL03, Chen & Rambow, EMNLP03, Gildea & Hockemaier, EMNLP03

40 Columbia, 1/29/04 40 Penn Additional Automatic Role Labelers  Performance improved from 77% to 88% Colorado  New results, original features, labels, 88%, 93% Penn  (Gold Standard parses, < 10 instances)  Same features plus  Named Entity tags  Head word POS  For unseen verbs – backoff to automatic verb clusters  SVM’s  Role or not role  For each likely role, for each Arg#, Arg# or not  No overlapping role labels allowed Pradhan, et. al., ICDM03, Sardeneau, et. al, ACL03, Chen & Rambow, EMNLP03, Gildea & Hockemaier, EMNLP03

41 Columbia, 1/29/04 41 Penn Word Senses in PropBank  Orders to ignore word sense not feasible for 700+ verbs  Mary left the room  Mary left her daughter-in-law her pearls in her will Frameset leave.01 "move away from": Arg0: entity leaving Arg1: place left Frameset leave.02 "give": Arg0: giver Arg1: thing given Arg2: beneficiary How do these relate to traditional word senses in VerbNet and WordNet?

42 Columbia, 1/29/04 42 Penn Mapping from PropBank to VerbNet Frameset id = leave.02 Sense = give VerbNet class = future-having 13.3 Arg0GiverAgent Arg1Thing givenTheme Arg2BenefactiveRecipient

43 Columbia, 1/29/04 43 Penn Mapping from PB to VerbNet

44 Columbia, 1/29/04 44 Penn Mapping from PropBank to VerbNet  Overlap with PropBank framesets  50,000 PropBank instances  85% VN classes  Results  MATCH %. (80.90% relaxed)  (VerbNet isn’t just linguistic theory!)  Benefits  Thematic role labels and semantic predicates  Can extend PropBank coverage with VerbNet classes  WordNet sense tags Kingsbury & Kipper, NAACL03, Text Meaning Workshop

45 Columbia, 1/29/04 45 Penn WordNet as a WSD sense inventory  Senses unnecessarily fine-grained?  Word Sense Disambiguation bakeoffs  Senseval1 – Hector, ITA = 95.5%  Senseval2 – WordNet 1.7, ITA verbs = 71%  Groupings of Senseval2 verbs, ITA =82%  Used syntactic and semantic criteria

46 Columbia, 1/29/04 46 Penn Groupings Methodology (w/ Dang and Fellbaum)  Double blind groupings, adjudication  Syntactic Criteria (VerbNet was useful)  Distinct subcategorization frames  call him a bastard  call him a taxi  Recognizable alternations – regular sense extensions:  play an instrument  play a song  play a melody on an instrument SIGLEX01, SIGLEX02, JNLE04

47 Columbia, 1/29/04 47 Penn Groupings Methodology (cont.)  Semantic Criteria  Differences in semantic classes of arguments  Abstract/concrete, human/animal, animate/inanimate, different instrument types,…  Differences in entailments  Change of prior entity or creation of a new entity?  Differences in types of events  Abstract/concrete/mental/emotional/….  Specialized subject domains

48 Columbia, 1/29/04 48 Penn Results – averaged over 28 verbs Total WN polysemy Group polysemy 8.07 ITA-fine 71% ITA-group 82% MX-fine 60.2% MX-group 69% MX – Maximum Entropy WSD, p(sense|context) Features: topic, syntactic constituents, semantic classes +2.5%, +1.5 to +5%, +6% Dang and Palmer, Siglex02,Dang et al,Coling02

49 Columbia, 1/29/04 49 Penn Grouping improved ITA and Maxent WSD  Call: 31% of errors due to confusion between senses within same group 1:  name, call -- (assign a specified, proper name to; They named their son David)  call -- (ascribe a quality to or give a name of a common noun that reflects a quality; He called me a bastard)  call -- (consider or regard as being;I would not call her beautiful)  75% with training and testing on grouped senses vs.  43% with training and testing on fine-grained senses

50 Columbia, 1/29/04 50 Penn WordNet: - call, 28 senses, groups WN5, WN16,WN12 WN15 WN26 WN3 WN19 WN4 WN 7 WN8 WN9 WN1 WN22 WN20 WN25 WN18 WN27 WN2 WN 13WN6 WN23 WN28 WN17, WN 11 WN10, WN14, WN21, WN24, Loud cry Label Phone/radio Bird or animal cry Request Call a loan/bond Visit Challenge Bid

51 Columbia, 1/29/04 51 Penn WordNet: - call, 28 senses, groups WN5, WN16,WN12 WN15 WN26 WN3 WN19 WN4 WN 7 WN8 WN9 WN1 WN22 WN20 WN25 WN18 WN27 WN2 WN 13WN6 WN23 WN28 WN17, WN 11 WN10, WN14, WN21, WN24, Loud cry Label Phone/radio Bird or animal cry Request Call a loan/bond Visit Challenge Bid

52 Columbia, 1/29/04 52 Penn Overlap between Groups and Framesets – 95% WN1 WN2 WN3 WN4 WN6 WN7 WN8 WN5 WN 9 WN10 WN11 WN12 WN13 WN 14 WN19 WN20 Frameset1 Frameset2 develop Palmer, Dang & Fellbaum, NLE 2004

53 Columbia, 1/29/04 53 Penn Sense Hierarchy  PropBank Framesets – coarse grained distinctions 20 Senseval 2 verbs w/ > 1 Frameset Maxent WSD system, 73.5% baseline, 90% accuracy  Sense Groups (Senseval-2) intermediate level (includes Levin classes) – 95% overlap, 69%  WordNet – fine grained distinctions, 60.2%

54 Columbia, 1/29/04 54 Penn English lexical resource is available That provides sets of possible syntactic frames for verbs with semantic role labels that can be automatically assigned accurately to new text. And provides clear, replicable sense distinctions.

55 Columbia, 1/29/04 55 Penn A Chinese Treebank Sentence 国会 /Congress 最近 /recently 通过 /pass 了 /ASP 银行法 /banking law “The Congress passed the banking law recently.” (IP (NP-SBJ (NN 国会 /Congress)) (VP (ADVP (ADV 最近 /recently)) (VP (VV 通过 /pass) (AS 了 /ASP) (NP-OBJ (NN 银行法 /banking law)))))

56 Columbia, 1/29/04 56 Penn The Same Sentence, PropBanked 通过 (f2) (pass) arg0 argM arg1 国会 最近 银行法 (law) (congress) (IP (NP-SBJ arg0 (NN 国会 )) (VP argM (ADVP (ADV 最近 )) (VP f2 (VV 通过 ) (AS 了 ) arg1 (NP-OBJ (NN 银行法 )))))

57 Columbia, 1/29/04 57 Penn Chinese PropBank Status - ( w/ Bert Xue and Scott Cotton)  Create Frame File for that verb -  Similar alternations – causative/inchoative, unexpressed object  5000 lemmas, 3000 DONE, (hired Jiang)  First pass: Automatic tagging 2500 DONE  Subcat frame matcher (Xue & Kulick, MT03)  Second pass: Double blind hand correction  In progress (includes frameset tagging), 1000 DONE  Ported RATS to CATS, in use since May  Third pass: Solomonization (adjudication)

58 Columbia, 1/29/04 58 Penn A Korean Treebank Sentence (S (NP-SBJ 그 /NPN+ 은 /PAU) (VP (S-COMP (NP-SBJ 르노 /NPR+ 이 /PCA) (VP (VP (NP-ADV 3/NNU 월 /NNX+ 말 /NNX+ 까지 /PAU) (VP (NP-OBJ 인수 /NNC+ 제의 /NNC 시한 /NNC+ 을 /PCA) 갖 /VV+ 고 /ECS)) 있 /VX+ 다 /EFN+ 고 /PAD) 덧붙이 /VV+ 었 /EPF+ 다 /EFN)./SFN) 그는 르노가 3 월말까지 인수제의 시한을 갖고 있다고 덧붙였다. He added that Renault has a deadline until the end of March for a merger proposal.

59 Columbia, 1/29/04 59 Penn The same sentence, PropBanked 덧붙이었다 그는갖고 있다 르노가인수제의 시한을 덧붙이다 ( 그는, 르노가 3 월말까지 인수제의 시한을 갖고 있다 ) (add) (he) (Renaut has a deadline until the end of March for a merger proposal) 갖다 ( 르노가, 3 월말까지, 인수제의 시한을 ) (has) (Renaut) (until the end of March) (a deadline for a merger proposal) Arg0Arg2 Arg0Arg1 (S Arg0 (NP-SBJ 그 /NPN+ 은 /PAU) (VP Arg2 (S-COMP ( Arg0 NP-SBJ 르노 /NPR+ 이 /PCA) (VP (VP ( ArgM NP-ADV 3/NNU 월 /NNX+ 말 /NNX+ 까지 /PAU) (VP ( Arg1 NP-OBJ 인수 /NNC+ 제의 /NNC 시한 /NNC+ 을 /PCA) 갖 /VV+ 고 /ECS)) 있 /VX+ 다 /EFN+ 고 /PAD) 덧붙이 /VV+ 었 /EPF+ 다 /EFN)./SFN) 3 월말까지 ArgM

60 Columbia, 1/29/04 60 Penn PropBank II  Nominalizations NYU  Lexical Frames DONE  Event Variables, (including temporals and locatives)  More fine-grained sense tagging  Tagging nominalizations w/ WordNet sense  Selected verbs and nouns  Nominal Coreference  not names  Clausal Discourse connectives – selected subset

61 Columbia, 1/29/04 61 Penn PropBank I Also, [ Arg0 substantially lower Dutch corporate tax rates] helped [ Arg1 [ Arg0 the company] keep [ Arg1 its tax outlay] [ Arg3- PRD flat] [ ArgM-ADV relative to earnings growth]]. relative to earnings… flatits tax outlaythe company keep the company keep its tax outlay flat tax rateshelp ArgM-ADVArg3- PRD Arg1Arg0REL Event variables; ID# h23 k16 nominal reference;sense tags; help2,5 tax rate1 keep1 company1 discourse connectives { } I

62 Columbia, 1/29/04 62 Penn Summary  Shallow semantic annotation that captures critical dependencies and semantic role labels  Supports training of supervised automatic taggers  Methodology ports readily to other languages  English PropBank release – spring 2004  Chinese PropBank release – fall 2004  Korean PropBank release – summer 2005

63 Columbia, 1/29/04 63 Penn Word sense in Machine Translation  Different syntactic frames  John left the room Juan saiu do quarto. (Portuguese)  John left the book on the table. Juan deizou o livro na mesa.  Same syntactic frame?  John left a fortune. Juan deixou uma fortuna.

64 Columbia, 1/29/04 64 Penn Summary of Multilingual TreeBanks, PropBanks Parallel Corpora TextTreebankPropBank IProp II Chinese Treebank Chinese 500K English 400K Chinese 500K English 100K Chinese 500K English 350K Ch 100K En 100K Arabic Treebank Arabic 500K English 500K Arabic 500K English ? ???? Korean Treebank Korean 180K English 50K Korean 180K English 50K Korean 180K English 50K

65 Columbia, 1/29/04 65 Penn Levin class: escape  WordNet Senses: WN 1, 5, 8  Thematic Roles: Location[+concrete] Theme[+concrete]  Frames with Semantics Basic Intransitive "The convict escaped" motion(during(E),Theme) direction(during(E),Prep,Theme, ~Location) Intransitive (+ path PP) "The convict escaped from the prison" Locative Preposition Drop "The convict escaped the prison"

66 Columbia, 1/29/04 66 Penn Levin class: future_having-13.3  WordNet Senses: WN 2,10,13  Thematic Roles: Agent[+animate OR +organization] Recipient[+animate OR +organization] Theme[]  Frames with Semantics Dative "I promised somebody my time" Agent V Recipient Theme has_possession(start(E),Agent,Theme) future_possession(end(E),Recipient,Theme) cause(Agent,E) Transitive (+ Recipient PP) "We offered our paycheck to her" Agent V Theme Prep(to) Recipient ) Transitive (Theme Object) "I promised my house (to somebody)" Agent V Theme

67 Columbia, 1/29/04 67 Penn Automatic classification  Merlo & Stevenson automatically classified 59 verbs with 69.8% accuracy  1. Unergative, 2. unaccusative, 3. object-drop  100M words automatically parsed  C5.0. Using features: transitivity, causativity, animacy, voice, POS  EM clustering – 61%, 2669 instances, 1M words  Using Gold Standard semantic role labels  1. float hop/hope jump march leap  2. change clear collapse cool crack open flood  3. borrow clean inherit reap organize study

68 Columbia, 1/29/04 68 Penn SENSEVAL – Word Sense Disambiguation Evaluation SENSEVAL SENSEVAL Languages Systems Eng. Lexical Sample Verbs/Poly/Instances Yes 13/12/215 Yes 29/16/110 Sense InventoryHector, 95.5% WordNet, 73+% NLE99, CHUM01, NLE02, NLE03 DARPA style bakeoff: training data, testing data, scoring algorithm.

69 Columbia, 1/29/04 69 Penn Maximum Entropy WSD Hoa Dang, best performer on Verbs  Maximum entropy framework, p(sense|context)  Contextual Linguistic Features  Topical feature for W:  keywords (determined automatically)  Local syntactic features for W:  presence of subject, complements, passive?  words in subject, complement positions, particles, preps, etc.  Local semantic features for W:  Semantic class info from WordNet (synsets, etc.)  Named Entity tag (PERSON, LOCATION,..) for proper Ns  words within +/- 2 word window

70 Columbia, 1/29/04 70 Penn Best Verb Performance - Maxent-WSD Hoa Dang 28 verbs - averageTotal WN polysemy ITA 71% MX-WSD 60.2% MX – Maximum Entropy WSD, p(sense|context) Features: topic, syntactic constituents, semantic classes +2.5%, +1.5 to +5%, +6% Dang and Palmer, Siglex02,Dang et al,Coling02

71 Columbia, 1/29/04 71 Penn Role Labels & Framesets as features for WSD  Preliminary results  Jinying Chen  Gold Standard PropBank annotation  Decision Tree C5.0,  Groups  5 verbs,  Features: Frameset tags, Arg labels  Comparable results to Maxent with PropBank features Syntactic frames and sense distinctions are inseparable

72 Columbia, 1/29/04 72 Penn Lexical resources provide concrete criteria for sense distinctions  PropBank – coarse grained sense distinctions determined by different subcategorization frames (Framesets)  Intersective Levin classes – regular sense extensions through differing syntactic constructions  VerbNet – distinct semantic predicates for each sense (verb class) Are these the right distinctions?

73 Columbia, 1/29/04 73 Penn Results – averaged over 28 verbs Total WN Grp 8.07 ITA-fine 71% ITA-group 82% MX-fine 60.2% JHU - MLultra 56.6%,58.7% MX-group 69%


Download ppt "Columbia, 1/29/04 1 Penn Putting Meaning Into Your Trees Martha Palmer University of Pennsylvania Columbia University New York City January 29, 2004."

Similar presentations


Ads by Google