E XTRACTING SEMANTIC ROLE INFORMATION FROM UNSTRUCTURED TEXTS Diana Trandab ă 1 and Alexandru Trandab ă 2 1 Faculty of Computer Science, University “Al.

Slides:



Advertisements
Similar presentations
SEMANTIC ROLE LABELING BY TAGGING SYNTACTIC CHUNKS
Advertisements

Page 1 SRL via Generalized Inference Vasin Punyakanok, Dan Roth, Wen-tau Yih, Dav Zimak, Yuancheng Tu Department of Computer Science University of Illinois.
1 Automatic Semantic Role Labeling Scott Wen-tau Yih Kristina Toutanova Microsoft Research Thanks to.
Syntax-Semantics Mapping Rajat Kumar Mohanty CFILT.
Syntactic analysis using Context Free Grammars. Analysis of language Morphological analysis – Chairs, Part Of Speech (POS) tagging – The/DT man/NN left/VBD.
Sequence Classification: Chunking Shallow Processing Techniques for NLP Ling570 November 28, 2011.
CL Research ACL Pattern Dictionary of English Prepositions (PDEP) Ken Litkowski CL Research 9208 Gue Road Damascus,
计算机科学与技术学院 Chinese Semantic Role Labeling with Dependency-driven Constituent Parse Tree Structure Hongling Wang, Bukang Wang Guodong Zhou NLP Lab, School.
Statistical NLP: Lecture 3
Semantic Role Labeling Abdul-Lateef Yussiff
A Joint Model For Semantic Role Labeling Aria Haghighi, Kristina Toutanova, Christopher D. Manning Computer Science Department Stanford University.
Towards Parsing Unrestricted Text into PropBank Predicate- Argument Structures ACL4 Project NCLT Seminar Presentation, 7th June 2006 Conor Cafferkey.
Steven Schoonover.  What is VerbNet?  Levin Classification  In-depth look at VerbNet  Evolution of VerbNet  What is FrameNet?  Applications.
CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?
SRL using complete syntactic analysis Mihai Surdeanu and Jordi Turmo TALP Research Center Universitat Politècnica de Catalunya.
Sag et al., Chapter 4 Complex Feature Values 10/7/04 Michael Mulyar.
Page 1 Generalized Inference with Multiple Semantic Role Labeling Systems Peter Koomen, Vasin Punyakanok, Dan Roth, (Scott) Wen-tau Yih Department of Computer.
Shallow semantic parsing: Making most of limited training data Katrin Erk Sebastian Pado Saarland University.
Syntax Lecture 3: The Subject. The Basic Structure of the Clause Recall that our theory of structure says that all structures follow this pattern: It.
SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.
Mining and Summarizing Customer Reviews
Introduction to English Syntax Level 1 Course Ron Kuzar Department of English Language and Literature University of Haifa Chapter 2 Sentences: From Lexicon.
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
Automatic Extraction of Opinion Propositions and their Holders Steven Bethard, Hong Yu, Ashley Thornton, Vasileios Hatzivassiloglou and Dan Jurafsky Department.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
1 Statistical NLP: Lecture 10 Lexical Acquisition.
PropBank, VerbNet & SemLink Edward Loper. PropBank 1M words of WSJ annotated with predicate- argument structures for verbs. –The location & type of each.
“How much context do you need?” An experiment about context size in Interactive Cross-language Question Answering B. Navarro, L. Moreno-Monteagudo, E.
Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA , USA NLP Applications.
Assessing the Impact of Frame Semantics on Textual Entailment Authors: Aljoscha Burchardt, Marco Pennacchiotti, Stefan Thater, Manfred Pinkal Saarland.
The Impact of Grammar Enhancement on Semantic Resources Induction Luca Dini Giampaolo Mazzini
IV. SYNTAX. 1.1 What is syntax? Syntax is the study of how sentences are structured, or in other words, it tries to state what words can be combined with.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Improving Subcategorization Acquisition using Word Sense Disambiguation Anna Korhonen and Judith Preiss University of Cambridge, Computer Laboratory 15.
SYMPOSIUM ON SEMANTICS IN SYSTEMS FOR TEXT PROCESSING September 22-24, Venice, Italy Combining Knowledge-based Methods and Supervised Learning for.
An Effective Word Sense Disambiguation Model Using Automatic Sense Tagging Based on Dictionary Information Yong-Gu Lee
AQUAINT Workshop – June 2003 Improved Semantic Role Parsing Kadri Hacioglu, Sameer Pradhan, Valerie Krugler, Steven Bethard, Ashley Thornton, Wayne Ward,
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
INSTITUTE OF COMPUTING TECHNOLOGY Forest-based Semantic Role Labeling Hao Xiong, Haitao Mi, Yang Liu and Qun Liu Institute of Computing Technology Academy.
A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart
Modelling Human Thematic Fit Judgments IGK Colloquium 3/2/2005 Ulrike Padó.
Page 1 Probabilistic Parsing and Treebanks L545 Spring 2000.
Annotation for Hindi PropBank. Outline Introduction to the project Basic linguistic concepts – Verb & Argument – Making information explicit – Null arguments.
Combining Lexical Resources: Mapping Between PropBank and VerbNet Edward Loper,Szu-ting Yi, Martha Palmer September 2006.
Rules, Movement, Ambiguity
1/21 Automatic Discovery of Intentions in Text and its Application to Question Answering (ACL 2005 Student Research Workshop )
Section 11.3 Features structures in the Grammar ─ Jin Wang.
Presented By- Shahina Ferdous, Student ID – , Spring 2010.
Supertagging CMSC Natural Language Processing January 31, 2006.
11 Project, Part 3. Outline Basics of supervised learning using Naïve Bayes (using a simpler example) Features for the project 2.
Machine Translation Divergences: A Formal Description and Proposed Solution Bonnie J. Dorr University of Maryland Presented by: Soobia Afroz.
FILTERED RANKING FOR BOOTSTRAPPING IN EVENT EXTRACTION Shasha Liao Ralph York University.
NLP. Introduction to NLP Last week, Min broke the window with a hammer. The window was broken with a hammer by Min last week With a hammer, Min broke.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Lec. 10.  In this section we explain which constituents of a sentence are minimally required, and why. We first provide an informal discussion and then.
Relation Extraction (RE) via Supervised Classification See: Jurafsky & Martin SLP book, Chapter 22 Exploring Various Knowledge in Relation Extraction.
Identifying Expressions of Opinion in Context Eric Breck and Yejin Choi and Claire Cardie IJCAI 2007.
The University of Illinois System in the CoNLL-2013 Shared Task Alla RozovskayaKai-Wei ChangMark SammonsDan Roth Cognitive Computation Group University.
COSC 6336: Natural Language Processing
Coarse-grained Word Sense Disambiguation
Statistical NLP: Lecture 3
INAGO Project Automatic Knowledge Base Generation from Text for Interactive Question Answering.
Improving a Pipeline Architecture for Shallow Discourse Parsing
Probabilistic and Lexicalized Parsing
臺灣大學資訊工程學系 高紹航 臺灣大學外國語文學系 高照明
CS224N Section 3: Corpora, etc.
Structure of a Lexicon Debasri Chakrabarti 13-May-19.
Progress report on Semantic Role Labeling
Statistical NLP: Lecture 10
Presentation transcript:

E XTRACTING SEMANTIC ROLE INFORMATION FROM UNSTRUCTURED TEXTS Diana Trandab ă 1 and Alexandru Trandab ă 2 1 Faculty of Computer Science, University “Al. I. Cuza” Iai, Romania 2 Faculty of Electrical Engineering Technical University Gh. Asachi Iasi, Romania

I NTRODUCING S EMANTIC R OLE A NALYSIS Who, what, where, when, why? Predicates Verbs: sell, buy, cost, etc. Nouns: acquisition, etc. Semantic roles buyer seller goods money time etc.

I NTRODUCING S EMANTIC R OLE A NALYSIS Who, what, where, when, why? Predicates Verbs: sell, buy, cost, etc. Nouns: acquisition, etc. Actors buyer seller goods money time etc..

I NTRODUCING S EMANTIC R OLE A NALYSIS Who, what, where, when, why? Predicates Verbs: sell, buy, cost, etc. Nouns: acquisition, etc. Actors buyer seller goods money time etc. Semantic frame

I NTRODUCING S EMANTIC R OLE A NALYSIS Who, what, where, when, why? Predicates Verbs: sell, buy, cost, etc. Nouns: acquisition, etc. Actors buyer seller goods money time etc. Semantic frame Semantic Role

S EMANTIC R OLE R ESOURCES Hand-tagged corpora that encode such information for the English language were developed: VerbNet FrameNet PropBank Semantic roles resources for other languages followed: German Spanish French Japanese and also Romanian

S EMANTIC R OLES - P ROP B ANK PropBank defines, for each verb, two types of semantic roles: arguments: core semantic roles, needed to understand the verb’s meaning adjuncts: optional semantic roles, used to compete the verb’s meaning. Arguments: John reads a book. (valence 2) John gives Mary a book. (valence 3) Adjuncts (circumstantial complements) John reads a book in the train. John gave Mary a book for her birthday.

S EMANTIC R OLES - P ROP B ANK Arguments can be established for verbs or nouns: John presented the situation. John's presentation of the situation was very clear. Arguments: verb specific; marked with Arg0 to Arg5; Arg0 is generally the argument exhibiting features of an Agent, while Arg1 is a Patient or Theme. Adjuncts: general roles that can apply to any verb. marked as ARG-Ms (modifiers). can appear in any verb's frame, valence-independently. examples: TMP, LOC, MNR, CAU, MOD, PRC, etc.

A PPROACHES TO S EMANTIC R OLE L ABELING Semantic role resources lead to Semantic role labeling International competitions: SensEval-3, CONLL Shared Tasks Automatic Labeling of Semantic Roles: identifying semantic roles within a sentence and tag them with appropriate roles given a semantic predicate and its semantic frame Most general formulation of the Semantic Role Labeling (SRL) problem: The boy] Agent broke [the window] Theme [on Friday] Time. [The boy] Agent predicate window] Theme [on Friday] Time. [ NP NP PP. [The boy] Arg0 broke [the window] Arg1 [on Friday] Arg-M _ TMP.

A PPROACHES TO S EMANTIC R OLE L ABELING Semantic role resources lead to Semantic role labeling International competitions: SensEval-3, CONLL Shared Tasks Automatic Labeling of Semantic Roles: identifying semantic roles within a sentence and tag them with appropriate roles given a semantic predicate and its semantic frame Most general formulation of the Semantic Role Labeling (SRL) problem: The boy] Agent broke [the window] Theme [on Friday] Time. [The boy] Agent predicate window] Theme [on Friday] Time. [ NP NP PP. [The boy] Arg0 broke [the window] Arg1 [on Friday] Arg-M _ TMP.

A PPROACHES TO S EMANTIC R OLE L ABELING Semantic role resources lead to Semantic role labeling International competitions: SensEval-3, CONLL Shared Tasks Automatic Labeling of Semantic Roles: identifying semantic roles within a sentence and tag them with appropriate roles given a semantic predicate and its semantic frame Most general formulation of the Semantic Role Labeling (SRL) problem: The boy] Agent broke [the window] Theme [on Friday] Time. [The boy] Agent predicate window] Theme [on Friday] Time. [ NP NP PP. [The boy] Arg0 broke [the window] Arg1 [on Friday] ArgM _ TMP.

A PPROACHES TO S EMANTIC R OLE L ABELING Features: features that characterize the candidate argument and its context: the phrase type, headword, governing category; features that characterize the verb predicate and its context: the lemma, voice, subcategorization pattern of the verb. features that capture the relation (either syntactic or semantic) between the candidate and the predicate: the left/right position of the constituent with respect to the verb, the category path between them.

A PPROACHES TO S EMANTIC R OLE L ABELING Gildea and Jurafsky (2002) - The foundations of automatic semantic role labeling, using FrameNet data; estimates probabilities for semantic roles from the syntactic and lexical features. Results: 80,4% accuracy. Surdeanu et al. (2003) - allow to easily test large sets of features and study the impact of each feature. Results: Argument constituent identification: 88.98% F- measure; role assignment: 83.74%. Pradhan et al. (2005) - One-vs.-all formalism for SVM. F-measure 86.7% for both argument and adjunct types classification

A PPROACHES TO S EMANTIC R OLE L ABELING Drawbacks of existing systems: they don't treat nominal predicates, being only built for verbal predicates. they only consider one predicate per sentence > we don’t.

R ULE SRL - O UR S EMANTIC R OLE L ABELING S YSTEM Input: plain, raw text; Pre-processing step (ConLL-like style): part of speech (Stanford Parser); Syntactic dependencies (MaltParser). Output: file with constituents annotated with their corresponding semantic roles.

R ULE SRL - O UR S EMANTIC R OLE L ABELING S YSTEM The economy's temperature will be taken from several vantage points this week, with readings on trade, output, housing and inflation. The/DT economy/NN 's/POS temperature/NN will/MD be/VB taken/VBN from/IN several/JJ vantage/NN points/NNS this/DT week/NN,/, with/IN readings/NNS on/IN trade/NN,/, output/NN,/, housing/NN and/CC inflation/NN./.

R ULE SRL ARCHITECTURE Predicate Identification – this module takes the syntactic analyzed sentence and decides which of its verbs and nouns are predicational, thus for which ones semantic roles need to be identified; Predicate Sense Identification – once the predicates for a sentence are marked, each predicate need to be disambiguated since, for a given predicate, different sense may demand different types of semantic roles; Semantic Roles Identification – identify the semantic roles for each of the syntactic dependents of the selected predicates, and establish what kind of semantic roles it is (what label it carries).

R ULE SRL – P REDICATE I DENTIFICATION Identifies the words in the sentence that can be semantic predicates, and for which semantic roles need to be found and annotated. Relies mainly on the external resources – PropBank / NomBank For example, the verb to be has no annotation in PropBank, since it is a state and not an action verb. Another important factor in deciding if the verb/noun can play the predicate role is checking if it is head for any syntactic constituents. there is no point in annotating as predicative verbs with no syntactic descendents. the only exception that we allow is in the case of verbs with auxiliaries or modal verbs, since sometimes the arguments are linked to the auxiliary verb, instead of the main verb.

R ULE SRL – P REDICATE I DENTIFICATION After the predicates from the input sentence are identified, the next two modules are successively applied for all the predicates in the sentence, in order to identify for all of them all and only their arguments. The assignment of semantic roles depends on the number of predicates. [The assignment of semantic roles] ARG0 [ depends ] TARGET [on the number of predicates] ARG1. [ The assignment ] TARGET [of semantic roles] ARG1 depends on the number of predicates.

R ULE SRL – P REDICATE S ENSE I DENTIFICATION Determines which sense the predicate has, according to the PropBank / NomBank annotation, in order to select the types of semantic roles its sense allow for. PropBank and NomBank senses are to some extend similar to the sense annotation from WordNet the classification in sense classes (role sets in PropBank’s terminology) is centered less on the different meanings of the predicational word, and more on the difference between the sets of semantic roles that two senses may have. the senses and role sets in PropBank for a particular predicate are usually subsumed by WordNet senses, since the latter has a finer sense distinction.

R ULE SRL – P REDICATE S ENSE I DENTIFICATION Uses external resources: PropBank and NomBank frame files with role sets exemplified for each verb/noun; a set of rules for role set identification; a list of frequencies of the assignment of predicate senses in the training corpus. Rules: if predicate has just one role set > assign it as predicate sense. if the same type/number of roles (except for the adjuncts), the exact sense of the verb is not that important > assign the most frequent; when multiple choices are present: Apply empirical rules Check if the verb is a simple verb, a phrasal verb or a verbal collocation (i.e. take off, start up), since each has separate entry in ProbBank/NomBank;

R ULE SRL – P REDICATE S ENSE I DENTIFICATION Check the PropBank examples for adjunct types. Certain verbs appear more frequently with a specific type of adjunct, (e.g. Rain with local or temporal adjuncts, rather than causal). Use the dependency relations between the target predicate and the constituents (TMP or LOC relations indicate clearly an adjunct). For the ADV dependency relations, use the semantic class of the constituent, extracted using WordNet’s hypernyms hierarchy. Check also the lexicalizations of the adjuncts prepositions; An important source of information for verb sense identification is the frequency of a specific verb role set within the training corpus. This information is used as a backup solution, is case no other method can identify the verb role set.

R ULE SRL – S EMANTIC R OLES I DENTIFICATION External resources: PropBank and NomBank frame files; a set of rules for semantic role classification; a list of frequencies of the assignment of different semantic roles in the training corpus. The syntactic dependents are considered only on the first level below the predicate:

R ULE SRL – S EMANTIC R OLES I DENTIFICATION External resources: PropBank and NomBank frame files; a set of rules for semantic role classification; a list of frequencies of the assignment of different semantic roles in the training corpus. The syntactic dependents are considered only on the first level below the predicate:

R ULE SRL – S EMANTIC R OLES I DENTIFICATION External resources: PropBank and NomBank frame files; a set of rules for semantic role classification; a list of frequencies of the assignment of different semantic roles in the training corpus. The syntactic dependents are considered only on the first level below the predicate: from  from several vantage points (Prepositional Phrase - PP) week  this week (Noun Phrase -NP), , with  with readings on trade, output, housing and inflation (PP)

R ULE SRL – S EMANTIC R OLES I DENTIFICATION Rules: Arg0 is usually the NP subject for the active voice, or object for the passive verb; Arg1 for verb predicates is usually the direct object; Arg1 for noun predicates is the dependent whose group starts with the preposition of, if any; Relations such as TMP, LOC, MOD indicate the respective adjuncts: ARGM-TMP (temporal), ARGM-LOC (locative), ARGM- MNR (manner); For motion predicates, the preposition from indicates an Arg3 role, while the preposition to indicates the Arg4; ARGM-REC (reciprocals) are expressed by himself, itself, themselves, together, each other, jointly, both ; ARGM-NEG is used for elements such as not, n’t, never, no longer ; Only one core argument type is allowed for a specific predicate (only one Arg0 - Arg4); In general, if an argument satisfies two core roles, the highest available ranked argument label should be selected, if available, where Arg0 >Arg1 >Arg2 >... >Arg4.

R ULE SRL EVALUATION An evaluation metric has been proposed within CoNLL shared task on semantic role labeling. The semantic frames are evaluated by reducing them to semantic dependencies from the predicate to all its individual arguments. These dependencies are labeled with the labels of the corresponding arguments. Additionally, a semantic dependency from each predicate to a virtual ROOT node is created. Two types of dependencies relations are evaluated: unlabeled attachment score: accuracy only for correctly identifying the predicate for a semantic role labeled attachment score: total accuracy, the arguments are correctly attached to the right predicate and correctly labeled with the right semantic role For both labeled and unlabeled dependencies, precision (P), recall (R), and F1 scores are computed.

R ULE SRL EVALATION : P REDICATE I DENTIFICATION

R ULE SRL EVALUATION – P REDICATE S ENSE I DENTIFICATION

R ESULTS We presented the development of a rule-based Semantic Role Labeling system (RuleSRL) designed to annotate raw text with semantic roles. The input text is initially pre-processed by adding part of speech information using the Stanford Parser and syntactic dependencies using the MaltParser. The rule-based SRL system was evaluated during the ConLL2008 Shared Task competition with a F1 measure of 63.79%. Since the best system participating at the competition scored about 85%, the rule-based method was considered insufficient, as many rules have been missed, or are too particular for a generalization Machine learning techniques are considered for the further development of a better SRL system The RuleSRL remains a baseline system. The rule-based system creation was very useful in identifying the main tasks to be addressed by a semantic role labeling system, as well in better understanding the semantic roles nature.

T HANK Y OU ! Questions?