Presentation on theme: "Class-based nominal semantic role labeling: a preliminary investigation Matt Gerber Michigan State University, Department of Computer Science."— Presentation transcript:
Class-based nominal semantic role labeling: a preliminary investigation Matt Gerber Michigan State University, Department of Computer Science
Introduction: semantic role labeling The semantic role Relation between a constituent and a predication The task Automatically identify semantic roles occurring in natural language Problematic: which roles are the “right” ones? Agent Theme Experiencer “John presented his findings to the committee.”
Introduction: PropBank (Kingsbury and Palmer 2003) Annotated corpus of semantic roles Base corpus: TreeBank 2 (Marcus et al., 1993) Evaluation CoNLL Shared Task (Carreras and Marquez, 2005) Implications QA: Kaisser and Webber (2007), Shen and Lapata (2007) Coreference: Ponzetto and Strube (2006) Information extraction: Surdeanu et al. (2003) Arg0 Arg1 Arg2 “John presented his findings to the committee.”
Introduction: NomBank (Meyers, 2007) Verbs are not the only lexical category with shallow semantic structure [ Arg0 Judge Curry] [ Predicate ordered] [ Arg1 Edison] [ Arg2 to make average refunds of about $45]. Judge Curry ordered [ Arg0 Edison] to make average [ Predicate refunds] [ Arg1 of about $45]. A more complete semantic interpretation of natural language Nominal Verbal
Introduction: NomBank (Meyers, 2007) Corpus information Base corpus: TreeBank 2 Distinct nominalizations: 4704 Total attestations: ~115K NomLex (Macleod et al., 1998) Nominalization classes (22) Nom (deverbals) Example: Sales departments then urged [ Predicate abandonment] [ Arg1 of the Pico Project]. Partitive (part-whole) Example: Hallwood owns about 11 [ Predicate %] [ Arg1 of Integra].
Research objectives Investigate the role of NomLex classes in automated NomBank SRL Hypotheses (1) Classes may exhibit consistent realizations of their arguments (2) Modeling each class separately may result in more homogeneous training data and better SRL performance
Outline Nominalization interpretation: related work NomBank SRL Class-based NomBank SRL Preliminary results and analysis Conclusions and future work
Nominalization interpretation: early work Rule-based methods Associate syntactic configurations with grammatical functions and semantic properties Dahl et al. (1987) Hull and Gomez (1996) Meyers et al. (1998) Statistical models: Lapata (2000) Identify underlying subject/object [subject satellite] observation [object satellite] observation
Nominalization interpretation: recent work SemEval (Girju, 2007) Semantic relations between nominals Cause-Effect: laugh wrinkles Instrument-Agency: laser printer Product-Producer: honey bee Origin-Entity: message entity from outer-space origin Theme-Tool: news conference Part-Whole: the door of the car Content-Container: the grocery bag
Nominalization interpretation: recent work NomBank SRL: Jiang and Ng (2006), Liu and Ng (2007) Direct application of verbal SRL methods Standard feature set Maximum entropy modeling Best overall f-measure score: NomBank-specific features had little impact
Overview of NomBank SRL Judge Curry ordered Edison to make average [Predicate refunds] of about $45. S VP S NP NNSJJ NP PP Full syntactic analysis
Overview of NomBank SRL Judge Curry ordered [Edison] to make average [Predicate refunds] [of about $45]. S VP S NP NNSJJ NP PP Argument identification Binary classification problem Argument Non-argument
Overview of NomBank SRL Judge Curry ordered [Arg0 Edison] to make average [Predicate refunds] [Arg1 of about $45]. S VP S NP NNSJJ NP PP Argument classification 22-class problem Arg0-Arg9 Temporal, location, etc.
NomBank SRL features
Class-based NomBank SRL Simple method Cluster nominalizations according to NomLex class membership Train a logistic regression model for each class Single-stage, 23-class strategy Baseline feature set Heuristic post-processing Backoff Trained over all classes
Class-based NomBank SRL Model application Hallwood owns about 11 [ Predicate %] of Integra. NomLex abandonment: … abatement: … abduction: … aberration: … ability: … abolition: … abomination: … NomAttributePartitiveRelationalBackoff Hallwood owns about 11 [ Predicate %] [ Arg1 of Integra].
Preliminary results and analysis Evaluation configuration Training instances: WSJ 2-21 Testing instances: WSJ 23 Automatically generated parse trees for training and testing Key observations Overall performance Per-class performance Class-based gains over baseline
General observations Negligible overall gains compared to Liu and Ng (2007), who reported overall f-measure of Some NomLex classes perform very well Classes introduce gains as well as losses
Analysis: intra-class regularity Hypothesis 1: classes may exhibit consistent realizations of their arguments Relational class (F1=90.94) Regularity: argument incorporation [ Arg2 Mr. Hunt’s] [ Arg0/Predicate attorney] said his client welcomed the gamble. 100% of Relational nominalizations have an incorporated Arg0 Constitutes 38% of test arguments for the class
Analysis: intra-class regularity Hypothesis 1: classes may exhibit consistent realizations of their arguments Partitive class (F1=79.85) Regularity: presence of Arg0 86% of Partitive instances take a single Arg0 Compare: 15% of Nom instances take a single Arg1
Analysis: class-based gains Hypothesis 2: modeling each class separately may result in more homogeneous training data and better SRL performance Improvements ClassTest instancesImprovement Nom-like Environment Group Job306.29
Analysis: class-based gains Hypothesis 2: modeling each class separately may result in more homogeneous training data and better SRL performance Losses ClassTest instancesLossClass ambiguityTraining instances Share of 5211 total Nom-adj-like of 5086 total
Conclusions and future work NomBank SRL based on classes derived from NomLex Demonstrates negligible gains over Liu and Ng (2007) Intra-class regularity leads to modest gains in some classes NomLex ambiguity causes losses in others
Conclusions and future work In-depth class modeling Identification of class-specific regularities not captured by the current feature set Further partitioning of the Nom class? NomLex class disambiguation
Thanks! Any questions?
References Carreras, X. & Màrquez, L. (2005), 'Introduction to the CoNLL-2005 Shared Task: Semantic Role Labeling'. Dahl, D. A.; Palmer, M. S. & Passonneau, R. J. (1987), Nominalizations in PUNDIT, in 'Proceedings of the 25th annual meeting on Association for Computational Linguistics', Association for Computational Linguistics, Morristown, NJ, USA, pp Girju, R.; Nakov, P.; Nastase, V.; Szpakowicz, S.; Turney, P. & Yuret, D. (2007), SemEval Task 04: Classification of Semantic Relations between Nominals, in 'Proceedings of the 4th International Workshop on Semantic Evaluations'. Hull, R. & Gomez, F. (1996), Semantic Interpretation of Nominalizations, in 'Proceedings of AAAI'. Jiang, Z. & Ng, H. (2006), Semantic Role Labeling of NomBank: A Maximum Entropy Approach, in 'Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing'. Kaisser, M. & Webber, B. (2007), Question Answering based on Semantic Roles, in 'ACL 2007 Workshop on Deep Linguistic Processing', Association for Computational Linguistics, Prague, Czech Republic, pp Kingsbury, P. & Palmer, M. (2003), Propbank: the next level of treebank, in 'Proceedings of Treebanks and Lexical Theories'. Lapata, M. (2000), The Automatic Interpretation of Nominalizations, in 'Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence', AAAI Press / The MIT Press,, pp
References (cont’d) Liu, C. & Ng, H. (2007), Learning Predictive Structures for Semantic Role Labeling of NomBank, in 'Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics', Association for Computational Linguistics, Prague, Czech Republic, pp Macleod, C.; Grishman, R.; Meyers, A.; Barrett, L. & Reeves, R. (1998), Nomlex: A lexicon of nominalizations, in 'Proceedings of the Eighth International Congress of the European Association for Lexicography'. Marcus, M.; Santorini, B. & Marcinkiewicz, M. A. (1993), 'Building a large annotated corpus of English: the Penn TreeBank', Computational Linguistics 19, Meyers, A. (2007), 'Annotation Guidelines for NomBank - Noun Argument Structure for PropBank', Technical report, New York University. Meyers, A.; Macleod, C.; Yangarber, R.; Grishman, R.; Barrett, L. & Reeves, R. (1998), Using NOMLEX to produce nominalization patterns for information extraction, in 'Proceedings of the COLING-ACL Workshop on the Computational Treatment of Nominals'. Ponzetto, S. P. & Strube, M. (2006), Exploiting semantic role labeling, WordNet and Wikipedia for coreference resolution, in 'Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics', Association for Computational Linguistics, Morristown, NJ, USA, pp Shen, D. & Lapata, M. (2007), Using Semantic Roles to Improve Question Answering, in 'Proceedings of the Conference on Empirical Methods in Natural Language Processing and on Computational Natural Language Learning', pp Surdeanu, M.; Harabagiu, S.; Williams, J. & Aarseth, P. (2003), Using predicate-argument structures for information extraction, in 'Proceedings of the 41st Annual Meeting on Association for Computational Linguistics'.