Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS626-449: NLP, Speech and Web-Topics-in-AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 37: Semantic Role Extraction (obtaining Dependency Parse)

Similar presentations


Presentation on theme: "CS626-449: NLP, Speech and Web-Topics-in-AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 37: Semantic Role Extraction (obtaining Dependency Parse)"— Presentation transcript:

1 CS626-449: NLP, Speech and Web-Topics-in-AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 37: Semantic Role Extraction (obtaining Dependency Parse)

2 Vaquious Triangle 2 Analysis Generation Transfer Based (do deep semantic process Before entering the target language) Direct (enter the target Language immediately Through a dictionary) Interlingua based (do deep semantic process Before entering the target language) Vaquious: an eminent French Machine Translation Researcher- Originally a Physicist

3 3 Universal Networking Language Universal Words (UWs) Relations Attributes Knowledge Base

4 4 UNL Graph obj agt @ entry @ past minister(icl>person) forward(icl>send) mail(icl>collection) He(icl>person) @def gol He forwarded the mail to the minister.

5 5 AGT / AOJ / OBJ AGT (Agent) Definition: Agt defines a thing which initiates an action AOJ (Thing with attribute) Definition: Aoj defines a thing which is in a state or has an attribute OBJ (Affected thing) Definition: Obj defines a thing in focus which is directly affected by an event or state

6 6 Examples John broke the window. agt ( break.@entry.@past, John) This flower is beautiful. aoj ( beautiful.@entry, flower) He blamed John for the accident. obj ( blame.@entry.@past, John)

7 7 BEN BEN (Beneficiary) Definition: Ben defines a not directly related beneficiary or victim of an event or state Can I do anything for you? ben ( do.@entry.@interrogation.@politeness, you ) obj ( do.@entry.@interrogation.@politeness, anything ) agt (do.@entry.@interrogation.@politeness, I )

8 8 PUR PUR (Purpose or objective) Definition: Pur defines the purpose or objectives of the agent of an event or the purpose of a thing exist This budget is for food. pur ( food.@entry, budget ) mod ( budget, this )

9 9 RSN RSN(Reason) Definition: Rsn defines a reason why an event or a state happens They selected him for his honesty. agt(select(icl>choose).@entry, they) obj(select(icl>choose).@entry, he) rsn (select(icl>choose).@entry, honesty)

10 10 TIM TIM (Time) Definition: Tim defines the time an event occurs or a state is true I wake up at noon. agt ( wake up.@entry, I ) tim ( wake up.@entry, noon(icl>time))

11 11 TMF TMF (Initial time) Definition: Tmf defines a time an event starts The meeting started from morning. obj ( start.@entry.@past, meeting.@def ) tmf ( start.@entry.@past, morning(icl>time) )

12 12 TMT TMT (Final time) Definition: Tmt defines a time an event ends The meeting continued till evening. obj ( continue.@entry.@past, meeting.@def ) tmt ( continue.@entry.@past,evening(icl>time) )

13 13 PLC PLC (Place) Definition: Plc defines the place an event occurs or a state is true or a thing exists He is very famous in India. aoj ( famous.@entry, he ) man ( famous.@entry, very) plc ( famous.@entry, India)

14 14 PLF PLF (Initial place) Definition: Plf defines the place an event begins or a state becomes true Participants come from the whole world. agt ( come.@entry, participant.@pl ) plf ( come.@entry, world ) mod ( world, whole)

15 15 PLT PLT (Final place) Definition: Plt defines the place an event ends or a state becomes false We will go to Delhi. agt ( go.@entry.@future, we ) plt ( go.@entry.@future, Delhi)

16 16 INS INS (Instrument) Definition: Ins defines the instrument to carry out an event I solved it with computer agt ( solve.@entry.@past, I ) ins ( solve.@entry.@past, computer ) obj ( solve.@entry.@past, it )

17 17 Attributes Constitute syntax of UNL Play the role of bridging the conceptual world and the real world in the UNL expressions Show how and when the speaker views what is said and with what intention, feeling, and so on Seven types: Time with respect to the speaker Aspects Speaker’s view of reference Speaker’s emphasis, focus, topic, etc. Convention Speaker’s attitudes Speaker’s feelings and viewpoints

18 18 Tense: @past The past tense is normally expressed by @past {unl} agt(go.@entry.@past, he) … {/unl} He went there yesterday

19 19 Aspects: @progress {unl} man ( rain.@entry.@present.@progress, hard ) {/unl} It’s raining hard.

20 20 Speaker’s view of reference @def (Specific concept (already referred)) The house on the corner is for sale. @indef (Non-specific class) There is a book on the desk @not is always attached to the UW which is negated. He didn’t come. agt ( come.@entry.@past.@not, he )

21 21 Speaker’s emphasis @emphasis John his name is. mod ( name, he ) aoj ( John.@emphasis.@entry, name ) @entry d enotes the entry point or main UW of an UNL expression

22 22 Subcategorization Frames Specify the categorial class of the lexical item. Specify the environment. Examples: kick: [V; _ NP] cry: [V; _ ] rely: [V; _PP] put: [V; _ NP PP] think: : [V; _ S` ]

23 23 Subcategorization Rules V y / _NP] _ ] _PP] _NP PP] _S`] Subcategorization Rule:

24 24 Subcategorization Rules 1. S  NP VP 2. VP  V (NP) (PP) (S`)… 3. NP  Det N 4. V  rely / _PP] 5. P  on / _NP] 6. Det  the 7. N  boy, friend The boy relied on the friend.

25 25 Semantically Odd Constructions Can we exclude these two ill-formed structures ? *The boy frightened sincerity. *Sincerity kicked the boy. Selectional Restrictions

26 26 Selectional Restrictions Inherent Properties of Nouns: [+/- ABSTRACT], [+/- ANIMATE] E.g., Sincerity [+ ABSTRACT] Boy [+ANIMATE]

27 27 Selectional Rules A selectional rule specifies certain selectional restrictions associated with a verb. V y / [+/-ABSTARCT] [+/-ANIMATE] V frighten / [+/-ABSTARCT] [+ANIMATE] __

28 28 Subcategorization Frame forward V __ NP PP invitation N __ PP accessible A __ PP e.g., An invitation to the party e.g., A program making science is more accessible to young people e.g., We will be forwarding our new catalogue to you

29 29 Thematic Roles The man forwarded the mail to the minister. forward V __ NP PP Event FORWARD [ Thing THE MAN ], [ Thing THE MAIL ], [ Path TO THE MINISTER ] ( )

30 30 How to define the UWs in UNL Knowledge-Base? Nominal concept Abstract Concrete Verbal concept Do Occur Be Adjective concept Adverbial concept

31 31 Nominal Concept: Abstract thing abstract thing{(icl>thing)} culture(icl>abstract thing) civilization(icl>culture{>abstract thing}) direction(icl>abstract thing) east(icl>direction{>abstract thing}) duty(icl>abstract thing) mission(icl>duty{>abstract thing}) responsibility(icl>duty{>abstract thing}) accountability{(icl>responsibility>duty)} event(icl>abstract thing{,icl>time>abstract thing}) meeting(icl>event{>abstract thing,icl>group>abstract thing}) conference(icl>meeting{>event}) TV conference{(icl>conference>meeting)}

32 32 Nominal Concept: Concrete thing concrete thing{(icl>thing,icl>place>thing)} building(icl>concrete thing) factory(icl>building{>concrete thing}) house(icl>building{>concrete thing}) substance(icl>concrete thing) cloth(icl>substance{>concrete thing}) cotton(icl>cloth{>substance}) fiber(icl>substance{>concrete thing}) synthetic fiber{(icl>fiber>substance)} textile fiber{(icl>fiber>substance)} liquid(icl>substance{>concrete thing}) beverage(icl>food,icl>liquid>substance}) coffee(icl>beverage{>food}) liquor(icl>beverage{>food}) beer(icl>liquor{>beverage})

33 33 Verbal concept: do do({icl>do,}agt>thing,gol>thing,obj>thing) express({icl>do(}agt>thing,gol>thing,obj>thing{)}) state(icl>express(agt>thing,gol>thing,obj>thing)) explain(icl>state(agt>thing,gol>thing,obj>thing)) add({icl>do(}agt>thing,gol>thing,obj>thing{)}) change({icl>do(}agt>thing,gol>thing,obj>thing{)}) convert(icl>change(agt>thing,gol>thing,obj>thing) classify({icl>do(}agt>thing,gol>thing,obj>thing{)}) divide(icl>classify(agt>thing,gol>thing,obj>thing))

34 34 Verbal concept: occur and be occur({icl>occur,}gol>thing,obj>thing) melt({icl>occur(}gol>thing,obj>thing{)}) divide({icl>occur(}gol>thing,obj>thing{)}) arrive({icl>occur(}obj>thing{)}) be({icl>be,}aoj>thing{,^obj>thing}) exist({icl>be(}aoj>thing{)}) born({icl>be(}aoj>thing{)})

35 35 How to define the UWs in UNL Knowledge Base? In order to distinguish among the verb classes headed by 'do', 'occur' and 'be', the following features are used: UWUW [ need an agent ] [ need an object ] English 'do'++"to kill" 'occur'-+"to fall" 'be'--"to know"

36 36 The verbal UWs (do, occur, be) also take some pre-defined semantic cases, as follows: How to define the UWs in UNL Knowledge- Base? UWPRE-DEFINED CASESEnglish 'do'takes necessarily agt>thing"to kill" 'occur'takes necessarily obj>thing"to fall" 'be'takes necessarily aoj>thing"to know"

37 37 Complex sentence I want to watch this movie. movie(icl>) want (icl>) @entry.@past obj @def :01 I (iof>person) watch (icl>do) @entry.@inf obj agt I (iof>person)

38 38 Approach to UNL Generation

39 Problem Definition Generate UNL expressions for English sentences in a robust and scalable manner, using syntactic analysis and lexical resources extensively. This needs detecting semantically relatable entities and solving attachment problems

40 Semantically Relatable Sequences (SRS) Definition: A semantically relatable Sequence (SRS) of a sentence is a group of words in the sentence (not necessarily consecutive) that appear in the semantic graph of the sentence as linked nodes or nodes with speech act labels (This is motivated by UNL representation)

41 SRS as an intermediary to and intermediary Source Language Sentence Target Language Sentence SRSUNL

42 Example to illustrate SRS “The man bought a new car in June” in: modifier a: indefinite the: definite man past tense agent bought object time car new June modifier

43 Sequences from “the man bought a new car in June” a. {man, bought} b. {bought, car} c. {bought, in, June} d. {new, car} e. {the, man} f. {a, car}

44 Basic questions Which words can form semantic constituents, which we call Semantically Relatable Sequences (SRS)? What after all are the SRSs of the given sentence? What semantic relations can link the words in an SRS and the SRSs themselves?

45 Postulate A sentence needs to be broken into Sequences of at most three forms {CW, CW} {CW, FW, CW} {FW, CW} where CW refers to content word or a clause and FW to function word

46 SRS and Language Phenomena

47 Movement: Preposition Stranding John, we laughed at. (we, laughed.@entry)---------(CW, CW) (laughed.@entry,at, John)---(CW, FW, CW)laughed.@entry

48 Movement: Topicalization The problem, we solved. (we, solved.@entry)------------(CW, CW) (solved.@entry, problem)-----(CW,CW) (the, problem)--------------------(CW,CW)

49 Movement: Relative Clauses John told a joke which we had already heard. (John, told.@entry) -------------------(CW, CW)told.@entry (told.@entry, :01) ---------------------(CW,CW) SCOPE01(we,had,heard.@entry)-------(CW, FW,CW) SCOPE01(already,heard.@entry)-------(CW,CW) SCOPE01(heard@entry,which,joke)---- (CW,FW,CW) SCOPE01(a, joke)--------------------------(FW,CW)

50 Movement: Interrogatives Who did you refer her to? (did, refer.@entry.@interrogative)-------(FW,CW) (you, refer.@entry.@interrogative)--------(CW,CW) (refer.@entry.@interrogative, her)-------- (CW,CW) (refer.@entry.@interrogative, to,who)---- (CW,FW,CW)

51 Empty Pronominals: to- infinitivals Bill was wise to sell the piano. (wise.@entry, SCOPE01)---------------(CW,CW) SCOPE01(sell.@entry, piano)---------(CW,CW) (Bill, was, wise.@entry) -----------------(CW, FW,CW) SCOPE01(Bill, to, sell.@entry)---------(CW, FW,CW) SCOPE01(the, piano) --------------------(FW,CW)

52 Empty pronominal: Gerundial The cat leapt down spotting a thrush on the lawn. (The, cat) -------------------------------(FW, CW) (cat, leapt.@entry) --------------------(CW, CW) (leapt.@entry, down) ----------------(CW, CW) (leapt.@entry, SCOPE01) -----------------(CW, CW) SCOPE01(spotting.@entry,thrush)--------(CW,CW) SCOPE01(spotting.@entry,on,lawn)---(CW,FW,CW)

53 PP Attachment John cracked the glass with a stone. (John, cracked.@entry)-------------- (CW,CW) (cracked.@entry, glass)-------------(CW,CW) (cracked.@entry, with, stone)---- (CW,FW,CW) (a, stone)------------------------------(FW,CW) (the,glass)-------------------------(FW,CW)

54 SRS and PP attachment (Mohanty, Almeida, Bhattacharyya, 04) ConditionsSub-conditionsAttachment Point [PP] is subcategorized by the verb [V] [NP2] is licensed by a preposition [P] [NP2] is attached to the verb [V] (e.g., He forwarded the mail to the minister) [PP] is subcategorized by the noun in [NP1] [NP2] is licensed by a preposition [P] [NP2] is attached to the noun in [NP1](e.g., John published six articles on machine translation ) [PP] is neither subcategorized by the verb [V] nor by the noun in [NP1] [NP2] refers to [PLACE] / [TIME] feature [NP2] is attached to the verb [V](e.g., I saw Mary in her office; The girls met the teacher on different days)

55 Linguistic Study to Computation

56 Syntactic constituents to Semantic constituents A probabilistic parser (Charniak, 04) is used. Other resources: Wordnet and Oxford Advanced Learner’s Dictionary In a parse tree, tags give indications of CW and FW: NP, VP, ADJP and ADVP  CW PP (prepositional phrase), IN (preposition) and DT (determiner)  FW

57 Observation: Headwords of sibling nodes form SRSs “John has bought a car.” SRS: {has, bought}, {a, car}, {bought, car} a (C) VP bought (F) AUX has (C) VP bought (C) VBD bought (C) NP car (F) DT a (C) NN car bought car has

58 Need: Resilience to wrong PP attachment “John has published an article on linguistics” Use PP attachment heuristics Get {article, on, linguistics} on linguistics (C)VP published (F) PP on (C)VBD published (C)NP article published (F)DT an an (C)NN article (F)IN on article (C)NNS linguistics (C)NP linguistics

59 to-infinitival “I forced him to watch this movie” Clause boundary is the VP node, labeled with SCOPE Tag is modified to TO, a FW tag, indicating that it heads a to-infinitival clause, The duplication and insertion of the NP node with head him (depicted by shaded nodes) as a sibling of the VBD node with head forced is done to bring out the existence of a semantic relation between force and him. (C)VP watch (C)VBD forced (C)NP him (C) S SCOPE (F)TO toto (C)VP forced to forced (C)VP (C)PRP him him (C)NP him him (C)PRP him

60 Linking of clauses: “John said that he was reading a novel” Head of S node marked as Scope SRS: {said, that, SCOPE}. Adverbial clauses have similar parse tree structures except that the subordinating conjunctions are different from that. (C)VBD said (F) SBAR that (C) VP said (F) IN that (C) S SCOPE saidthat

61 Implementation Block Diagram of the system Parse Tree Charniak Parser Scope Handler Attachment Resolver WordNet 2.0 Sub-categorization Database Input Sentence Parse Tree modification and augmentation with head and scope information Augmented Parse Tree Semantically Related Sequences Noun classification Semantically Relatable Sequences Generator THAT clause as Subcat property Preposition as Subcat property Time and Place features

62 Head determination Uses a bottom-up strategy to determine the headword for every node in the parse tree. Crucial in obtaining the SRSs, since wrong head information may end up getting propagated all the way up the tree Processes the children of every node starting from the rightmost child and checks the head information already specified against the node’s tag to determine the head of the node Some special cases are: SBAR node A VP node with PRO insertion, copula, Phrasal verbs etc. NP nodes with of-PP cases and conjunctions under them, which lead to scope creation.

63 Scope handler Performs modification on the parse trees by insertion of nodes in to- infinitival cases Adjusts of the tag and head information in case of SBAR nodes

64 Attachment resolver Takes a (CW 1, FW, CW 2 ) as input and checks the time and place features of CW 2, the noun class of CW 1 and the subcategorization information for the CW 1 and FW pair to decide the attachment. If none of these yield any deterministic results, take the attachment indicated by the parser

65 SRS generator Performs a breadth-first search on the parse tree and performs detailed processing at every node N 1 of the tree. S nodes which dominate entire clauses (main or embedded) are treated as CWs. SBAR and TO nodes are treated as FWs.

66 Algorithm If the node N1 is a CW (new/JJ, published/VBD, fact/NN, boy/NN, John/NNP) perform the following checks: If the sibling N2 of N­1 is a CW (car/NN, article/NN, SCOPE/S) Then create {CW,CW} ({new, car}, {published, article}, {boy, SCOPE}) If the sibling N2 is a FW (in/PP, that/SBAR, and/CC) Then, check if N2 has a child FW, N3 (in/IN, that/IN) and a child CW, N4 (June/NN, SCOPE/S) If yes, Then use attachment resolver to decide the CW to which N3 and N4 attach. Create{CW,FW,CW} ({published, in, June}, {fact, that, SCOPE}) If no, Then check if next sibling N5 of N­1 is a CW (Mary/NN)  If yes, Create {CW,FW,CW} ({John, and, Mary}) If the node N1 is a FW (the/DT, is/AUX, to/TO), perform the following checks: If the parent node is a CW (boy/NP, famous/VP) Check if sibling is an adjective. i.If yes,(famous/JJ) Then, create {CW,FW,CW} ({She, is, famous}) ii. If no, (boy/NN) Then, create {FW,CW} ({the, boy}, {has, bought}) If the parent node N6 is a FW (to/TO) and the sibling node N7 is a CW (learn/VB) Use attachment resolver to decide on the preceding CW to which N6 and N7 can attach. Create {CW,FW,CW} ({exciting, to, learn})

67 Evaluation FrameNet corpus [Baker et. al., 1998], a semantically annotated corpus, as the testdata. 92310 sentences (call this the gold standard) Created automatically from the FrameNet corpus taking verbs, nouns and adjectives as the targets Verbs as the target- 37,984 (i.e., semantic frames of verbs) Nouns as the target-37,240 Adjectives as the target-17,086

68 Score for high frequency verbs VerbFrequencyScore Swim2800.709 Depend2150.804 Look1870.835 Roll1730.7 Rush1720.775 Phone1620.695 Reproduce1590.797 Step1590.795 Urge1570.765 Avoid1520.789

69 Scores of 10 verb groups of high frequency in the Gold Standard

70 Scores of 10 noun groups of high frequency in the Gold Standard

71 An actual sentence A. Sentence: A form of asbestos once used to make Kent cigarette filters has caused a high percentage of cancer deaths among a group of workers exposed to it more than 30 years ago, researchers reported.

72 Relative performance on SRS constructs

73 Results on sentence constructs Rajat Mohanty, Anupama Dutta and Pushpak Bhattacharyya, Semantically Relatable Sets: Building Blocks for Repesenting SemanticsSemantically Relatable Sets: Building Blocks for Repesenting Semantics, 10th Machine Translation Summit ( MT Summit 05), Phuket, September, 2005.

74 Statistical Approach

75 Use SRL marked corpora Daniel Gildea and Daniel Jurafsky. 2002. Automatic labeling of semantic roles. Computational Linguistics, 28(3):245–288. PropBank corpus Role annotated WSJ part of Penn Treebank [10] PropBank role-set [2,4] Core roles: A RG 0 (Proto-agent), A RG 1 (Proto-patient) to A RG 5 Adjunctive roles: A RGM - LOC (for locatives), A RGM - TMP (for temporals), etc.

76 SRL marked corpora contd… PropBank roles: an example [ARG0 It] operates] [ARG1 stores] [ARGM−LOC mostly in Iowa and Nebraska] Preprocessing systems [2] Part of speech tagger Base Chunker Full syntactic parser Named entities recognizer Fig.4: Parse tree output, Source: [5]

77 Probabilistic estimation [1] Empirical probability estimation over candidate roles for each constituent based upon extracted features here, t is the target word r is a candidate role, h, pt, gov, voice are features Linear interpolation, with condition Geometric mean, with condition

78 A state-of-art SRL system: ASSERT [4] Main points [3,4] Use of Support Vector Machine [13] as classifier Similar to FrameNet “domains”, “Predicate Clusters” are introduced Named Entities [14] is used as a new feature Experiment I (Parser dependency testing) Use of PropBank bracketed corpus Use of Charniak parser trained on Penn Treebank corpus ParseTaskPrecision (%)Recall (%)F-score (%)Accuracy (%) Treebank Id.97.596.196.8- Class.---93.0 Id. + Class.91.890.591.2- Charniak Id.87.884.185.9- Class.---92.0 Id. + Class.81.778.480.0- Table 1: Performance of ASSERT for Treebank and Charniak parser outputs. Id. Stands for identification task and Class. stands for classification task. Data source: [4]

79 Experiments and Results Experiment II (Cross genre testing) 1. Training on PropBanked WSJ data and testing on Brown Corpus 2. Charniak parser trained on first PropBank then Brown Table 2: Performance of ASSERT for various experimental combinations Date source: [4]


Download ppt "CS626-449: NLP, Speech and Web-Topics-in-AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 37: Semantic Role Extraction (obtaining Dependency Parse)"

Similar presentations


Ads by Google