Semantic Role Labeling using Maximum Entropy Model Joon-Ho Lim NLP Lab. Korea Univ.

Slides:

Advertisements

Similar presentations

SEMANTIC ROLE LABELING BY TAGGING SYNTACTIC CHUNKS

Advertisements

Integrated Instance- and Class- based Generative Modeling for Text Classification Antti PuurulaUniversity of Waikato Sung-Hyon MyaengKAIST 5/12/2013 Australasian.

On-line learning and Boosting

Support Vector Machines

计算机科学与技术学院 Chinese Semantic Role Labeling with Dependency-driven Constituent Parse Tree Structure Hongling Wang, Bukang Wang Guodong Zhou NLP Lab, School.

Semantic Role Labeling Abdul-Lateef Yussiff

Artificial Intelligence Lecture 2 Dr. Bo Yuan, Professor Department of Computer Science and Engineering Shanghai Jiaotong University

Jun Zhu Dept. of Comp. Sci. & Tech., Tsinghua University This work was done when I was a visiting researcher at CMU. Joint.

Hidden Markov Model based 2D Shape Classification Ninad Thakoor 1 and Jean Gao 2 1 Electrical Engineering, University of Texas at Arlington, TX-76013,

Part 4 b Forward-Backward Algorithm & Viterbi Algorithm CSE717, SPRING 2008 CUBS, Univ at Buffalo.

SRL using complete syntactic analysis Mihai Surdeanu and Jordi Turmo TALP Research Center Universitat Politècnica de Catalunya.

Two-Phase Semantic Role Labeling based on Support Vector Machines Kyung-Mi Park Young-Sook Hwang Hae-Chang Rim NLP Lab. Korea Univ.

Page 1 Generalized Inference with Multiple Semantic Role Labeling Systems Peter Koomen, Vasin Punyakanok, Dan Roth, (Scott) Wen-tau Yih Department of Computer.

Graph-Based Semi-Supervised Learning with a Generative Model Speaker: Jingrui He Advisor: Jaime Carbonell Machine Learning Department

A Memory-Based Approach to Semantic Role Labeling Beata Kouchnir Tübingen University 05/07/04.

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

Selective Sampling on Probabilistic Labels Peng Peng, Raymond Chi-Wing Wong CSE, HKUST 1.

Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.

Authors: Ting Wang, Yaoyong Li, Kalina Bontcheva, Hamish Cunningham, Ji Wang Presented by: Khalifeh Al-Jadda Automatic Extraction of Hierarchical Relations.

Graphical models for part of speech tagging

EM and expected complete log-likelihood Mixture of Experts

1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.

Statistical NLP: Lecture 8 Statistical Inference: n-gram Models over Sparse Data (Ch 6)

MaxEnt: Training, Smoothing, Tagging Advanced Statistical Methods in NLP Ling572 February 7,

A Weakly-Supervised Approach to Argumentative Zoning of Scientific Documents Yufan Guo Anna Korhonen Thierry Poibeau 1 Review By: Pranjal Singh Paper.

Language Modeling Anytime a linguist leaves the group the recognition rate goes up. (Fred Jelinek)

A Language Independent Method for Question Classification COLING 2004.

Chapter6. Statistical Inference : n-gram Model over Sparse Data 이 동 훈 Foundations of Statistic Natural Language Processing.

AQUAINT Workshop – June 2003 Improved Semantic Role Parsing Kadri Hacioglu, Sameer Pradhan, Valerie Krugler, Steven Bethard, Ashley Thornton, Wayne Ward,

CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.

Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data.

CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov

Classification Techniques: Bayesian Classification

Maximum Entropy (ME) Maximum Entropy Markov Model (MEMM) Conditional Random Field (CRF)

A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,

Hierarchical Classification

An Index of Data Size to Extract Decomposable Structures in LAD Hirotaka Ono Mutsunori Yagiura Toshihide Ibaraki (Kyoto University)

Maximum Entropy Models and Feature Engineering CSCI-GA.2590 – Lecture 6B Ralph Grishman NYU.

Supervised Clustering of Label Ranking Data Mihajlo Grbovic, Nemanja Djuric, Slobodan Vucetic {mihajlo.grbovic, nemanja.djuric,

CHAPTER 8 DISCRIMINATIVE CLASSIFIERS HIDDEN MARKOV MODELS.

Bing LiuCS Department, UIC1 Chapter 8: Semi-supervised learning.

CHAPTER 6 Naive Bayes Models for Classification. QUESTION????

1 / 12 Michael Beer, Vladik Kreinovich COMPARING INTERVALS AND MOMENTS FOR THE QUANTIFICATION OF COARSE INFORMATION M. Beer University of Liverpool V.

1 A Biterm Topic Model for Short Texts Xiaohui Yan, Jiafeng Guo, Yanyan Lan, Xueqi Cheng Institute of Computing Technology, Chinese Academy of Sciences.

1 Classification and Feature Selection Algorithms for Multi-class CGH data Jun Liu, Sanjay Ranka, Tamer Kahveci

School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented.

Comparative Experiments on Sentiment Classification for Online Product Reviews Hang Cui, Vibhu Mittal, and Mayur Datar AAAI 2006.

An Index of Data Size to Extract Decomposable Structures in LAD Hirotaka Ono Mutsunori Yagiura Toshihide Ibaraki (Kyoto Univ.)

11 Project, Part 3. Outline Basics of supervised learning using Naïve Bayes (using a simpler example) Features for the project 2.

Maximum Entropy Discrimination Tommi Jaakkola Marina Meila Tony Jebara MIT CMU MIT.

Discriminative Frequent Pattern Analysis for Effective Classification By Hong Cheng, Xifeng Yan, Jiawei Han, Chih- Wei Hsu Presented by Mary Biddle.

HMM vs. Maximum Entropy for SU Detection Yang Liu 04/27/2004.

1 Fine-grained and Coarse-grained Word Sense Disambiguation Jinying Chen, Hoa Trang Dang, Martha Palmer August 22, 2003.

NTU & MSRA Ming-Feng Tsai

Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.

A Brief Maximum Entropy Tutorial Presenter: Davidson Date: 2009/02/04 Original Author: Adam Berger, 1996/07/05

Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.

A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.

Advanced Gene Selection Algorithms Designed for Microarray Datasets Limitation of current feature selection methods: –Ignores gene/gene interaction: single.

Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.

N-Gram Model Formulas Word sequences Chain rule of probability Bigram approximation N-gram approximation.

Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.

Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.

Semantic Role Labelling Using Chunk Sequences Ulrike Baldewein Katrin Erk Sebastian Padó Saarland University Saarbrücken Detlef Prescher Amsterdam University.

Classification Techniques: Bayesian Classification

Learning with information of features

Mathematical Foundations of BME

Ping LUO*, Fen LIN^, Yuhong XIONG*, Yong ZHAO*, Zhongzhi SHI^

Logistic Regression [Many of the slides were originally created by Prof. Dan Jurafsky from Stanford.]

Presentation transcript:

Semantic Role Labeling using Maximum Entropy Model Joon-Ho Lim NLP Lab. Korea Univ.

2 Contents  Previous work Review the previously studied ML approaches for SRL  Semantic Role Labeling using ME Explain a probabilistic model for semantic role sequence, and an incremental approach  Feature Sets for Semantic Role Labeling  Experiments  Conclusion

3 Previous work  Machine learning approaches for SRL [Gildea 2002] : A probabilistic discriminative model  It needs a complex interpolation for smoothing because of the data sparseness problem. [Pradhan 2003] : Applied a SVM to SRL.  It requires high computational complexity because of the polynomial kernel function.  Because the SVM is a binary-classifier, one-vs-rest or pairwise method is required. [Thompson 2003] : A probabilistic generative model.  This model assumes that a constituent is generated by a semantic role, so it is called a Generative Model.  Because the constituent depends only on the role that generated it, and the constituent is independent of each other, this model can’t exploit rich features.

4 Semantic Role Labeling using ME (1/4)  An overview of Maximum Entropy Modeling It has advantages as follows:  It enables to exploit rich features.  It can alleviate the data sparseness problem.  It is adequate for the multi-class classification problem.  It has relatively low computational complexity. The conditional probability of predicting an outcome y given a history x :  f i (x,y) : feature function λ i : the weighting parameter of f i (x,y) k : the number of features Z(x) : the normalization factor

5 Semantic Role Labeling using ME (2/4)  Probabilistic model for semantic role sequence  R is the sequence of semantic roles.  c 1n is the sequence of the constituents such as a clause or a chunk.  r i is the i-th semantic role, which is represented by using a BIO- notation O is too frequently occurred than others, so it has somewhat high probability than others Therefore, we use B-A*, I-A*, O- (previous others), O0 (pred), O+ (posterior others). This probability is estimated using a ME.

6 Semantic Role Labeling using ME (3/4)  An incremental approach The motivation of an incremental approach is as follows:  It is much easier and more reliable to identify the arguments in the immediate clause which contains the target predicate. Most of semantic roles are located in the immediate clause. A few of semantic roles are located in the upper clauses which include the immediate clause It is composed of two steps 1) In the first step, we label the constituents in the immediate clause 2) In the second step, we label the constituents in the upper clauses

7 Semantic Role Labeling using ME (4/4)  Rewrite a probabilistic model to reflect an incremental approach  m : the number of constituents in the immediate clause  Φ 1 : a feature set for immediate clause  Φ 2 : a feature set for upper clauses Immediate clause Upper clauses

8 Feature Sets for SRL (1/2)  For the accurate SRL, the following features are important First, as features for the predicate.  We use pred-POS, pred-lex, pred-type, voice The pred-type represents an information such as to-infinitive form (TO), etc. The voice represents an information such as active or passive Second, as features for the constituent  We use constituent tag, head-word, content-word For example, for the PP-chunk, the head-word is a preposition, and the content-word is a noun. Third, as features for the relation between them  We use path, position path is the sequence of constituent tags between the current constituent and the predicate. position is the constituent position with respect to predicate.

9 Feature Sets for SRL (2/2) Fourth, as features for the context of the constituent  We use a previous-label, previous-tag, previous-head-word, next-tag, next-head-word  Using these features, we construct Conjoined Feature Sets (Φ 1, Φ 2 ). Because the ME model assumes the independence of features, we should conjoin the coherent features. The more detailed descriptions of the features and the feature sets are explained in the proceeding.

10 Experiments (1/2)  For our experiments, we use a Zhang le's MaxEnt Toolkit The BFGS parameter estimation algorithm is used with the Gaussian Prior Smoothing.  Experimental result of the proposed method on test set

11 Experiments (2/2)  From the experimental results, we can know that The proposed method has relatively high performance on the labels related to A0 and A1, while it has relatively low performance on the other labels. We think that this may be caused by following two reasons.  Firstly, the instances of A0 and A1 are provided enough for accurate semantic role labeling.  Secondly, the thematic roles of A0 and A1 are more clear than other core semantic roles. For example, agent is labeled as mainly A0 while benefactive can be labeled as A2 or A3. Therefore, the ME model can get a good generalization performance in case of A0 and A1, but can't generalize well in other cases.

12 Conclusion  Semantic role labeling method using a ME model We use an incremental approach  It is a kind of divide-and-conquer strategy. It is much easier and more reliable to identify the arguments in the immediate clause. And then, we label the semantic roles in the upper clauses As feature sets  We use features which utilize the characteristics of a predicate, a constituent, a relationship between them, and a context.

13 Thank you.