Page 1 Generalized Inference with Multiple Semantic Role Labeling Systems Peter Koomen, Vasin Punyakanok, Dan Roth, (Scott) Wen-tau Yih Department of Computer.

Slides:



Advertisements
Similar presentations
CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng.
Advertisements

Page 1 SRL via Generalized Inference Vasin Punyakanok, Dan Roth, Wen-tau Yih, Dav Zimak, Yuancheng Tu Department of Computer Science University of Illinois.
Maximum Margin Markov Network Ben Taskar, Carlos Guestrin Daphne Koller 2004.
Carolina Galleguillos, Brian McFee, Serge Belongie, Gert Lanckriet Computer Science and Engineering Department Electrical and Computer Engineering Department.
Structured SVM Chen-Tse Tsai and Siddharth Gupta.
A Linear Programming Formulation for Global Inference in Natural Language Tasks Dan RothWen-tau Yih Department of Computer Science University of Illinois.
A Joint Model For Semantic Role Labeling Aria Haghighi, Kristina Toutanova, Christopher D. Manning Computer Science Department Stanford University.
Semantic Role Chunking Combining Complementary Syntactic Views Sameer Pradhan, Kadri Hacioglu, Wayne Ward, James H. Martin, Daniel Jurafsky  Center for.
Page 1 Learning and Global Inference for Information Access and Natural Language Understanding Dan Roth Department of Computer Science University of Illinois.
Page 1 SRL via Generalized Inference Vasin Punyakanok, Dan Roth, Wen-tau Yih, Dav Zimak, Yuancheng Tu Department of Computer Science University of Illinois.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
SRL using complete syntactic analysis Mihai Surdeanu and Jordi Turmo TALP Research Center Universitat Politècnica de Catalunya.
A simple classifier Ridge regression A variation on standard linear regression Adds a “ridge” term that has the effect of “smoothing” the weights Equivalent.
Recovering Articulated Object Models from 3D Range Data Dragomir Anguelov Daphne Koller Hoi-Cheung Pang Praveen Srinivasan Sebastian Thrun Computer Science.
Learning to Match Ontologies on the Semantic Web AnHai Doan Jayant Madhavan Robin Dhamankar Pedro Domingos Alon Halevy.
A transformation-based approach to argument labeling Derrick Higgins Educational Testing Service
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Seven Lectures on Statistical Parsing Christopher Manning LSA Linguistic Institute 2007 LSA 354 Lecture 7.
Online Learning Algorithms
STRUCTURED PERCEPTRON Alice Lai and Shi Zhi. Presentation Outline Introduction to Structured Perceptron ILP-CRF Model Averaged Perceptron Latent Variable.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
Global Inference in Learning for Natural Language Processing Vasin Punyakanok Department of Computer Science University of Illinois at Urbana-Champaign.
Page 1 March 2009 Brigham Young University With thanks to: Collaborators: Ming-Wei Chang, Vasin Punyakanok, Lev Ratinov, Nick Rizzolo, Mark Sammons, Scott.
Unambiguity Regularization for Unsupervised Learning of Probabilistic Grammars Kewei TuVasant Honavar Departments of Statistics and Computer Science University.
SVM by Sequential Minimal Optimization (SMO)
EM and expected complete log-likelihood Mixture of Experts
Dual Coordinate Descent Algorithms for Efficient Large Margin Structured Prediction Ming-Wei Chang and Scott Wen-tau Yih Microsoft Research 1.
8/25/05 Cognitive Computations Software Tutorial Page 1 SNoW: Sparse Network of Winnows Presented by Nick Rizzolo.
Aspect Guided Text Categorization with Unobserved Labels Dan Roth, Yuancheng Tu University of Illinois at Urbana-Champaign.
This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.
Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.
June 2013 Inferning Workshop, ICML, Atlanta GA Amortized Integer Linear Programming Inference Dan Roth Department of Computer Science University of Illinois.
AQUAINT Workshop – June 2003 Improved Semantic Role Parsing Kadri Hacioglu, Sameer Pradhan, Valerie Krugler, Steven Bethard, Ashley Thornton, Wayne Ward,
Page 1 Learning and Inference in Natural Language From Stand Alone Learning Tasks to Structured Representations Dan Roth Department of Computer Science.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Page 1 Global Inference in Learning for Natural Language Processing.
Maximum Entropy (ME) Maximum Entropy Markov Model (MEMM) Conditional Random Field (CRF)
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,
Maximum Entropy Models and Feature Engineering CSCI-GA.2590 – Lecture 6B Ralph Grishman NYU.
Deep Learning for Efficient Discriminative Parsing Niranjan Balasubramanian September 2 nd, 2015 Slides based on Ronan Collobert’s Paper and video from.
Inference Protocols for Coreference Resolution Kai-Wei Chang, Rajhans Samdani, Alla Rozovskaya, Nick Rizzolo, Mark Sammons, and Dan Roth This research.
M Machine Learning F# and Accord.net.
Multi-core Structural SVM Training Kai-Wei Chang Department of Computer Science University of Illinois at Urbana-Champaign Joint Work With Vivek Srikumar.
11 Project, Part 3. Outline Basics of supervised learning using Naïve Bayes (using a simpler example) Features for the project 2.
Global Inference via Linear Programming Formulation Presenter: Natalia Prytkova Tutor: Maximilian Dylla
A Fast Finite-state Relaxation Method for Enforcing Global Constraints on Sequence Decoding Roy Tromble & Jason Eisner Johns Hopkins University.
Personalization Services in CADAL Zhang yin Zhuang Yuting Wu Jiangqin College of Computer Science, Zhejiang University November 19,2006.
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
Department of Computer Science The University of Texas at Austin USA Joint Entity and Relation Extraction using Card-Pyramid Parsing Rohit J. Kate Raymond.
Page 1 July 2008 ICML Workshop on Prior Knowledge for Text and Language Constraints as Prior Knowledge Ming-Wei Chang, Lev Ratinov, Dan Roth Department.
Static model noOverlaps :: ArgumentCandidate[] candidates -> discrete[] types for (i : (0.. candidates.size() - 1)) for (j : (i candidates.size()
Concept-Based Analysis of Scientific Literature Chen-Tse Tsai, Gourab Kundu, Dan Roth UIUC.
Julia Hockenmaier and Mark Steedman.   The currently best single-model statistical parser (Charniak, 1999) achieves Parseval scores of over 89% on the.
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
Naive Bayes (Generative Classifier) vs. Logistic Regression (Discriminative Classifier) Minkyoung Kim.
Introduction of SNoW (Sparse Network of Winnows )
Lecture 7: Constrained Conditional Models
Inference and Learning via Integer Linear Programming
Deep Feedforward Networks
Maximum Entropy Models and Feature Engineering CSCI-GA.2591
Integer Linear Programming Formulations in Natural Language Processing
Part 2 Applications of ILP Formulations in Natural Language Processing
By Dan Roth and Wen-tau Yih PowerPoint by: Reno Kriz CIS
CIS 700 Advanced Machine Learning for NLP Inference Applications
Improving a Pipeline Architecture for Shallow Discourse Parsing
Margin-based Decomposed Amortized Inference
6.001 SICP Interpretation Parts of an interpreter
Dan Roth Computer and Information Science University of Pennsylvania
Dan Roth Department of Computer Science
Presentation transcript:

Page 1 Generalized Inference with Multiple Semantic Role Labeling Systems Peter Koomen, Vasin Punyakanok, Dan Roth, (Scott) Wen-tau Yih Department of Computer Science University of Illinois at Urbana-Champaign

Page 2 Outline System Architecture  Pruning  Argument Identification  Argument Classification  Inference [main difference from other systems] Inference with Multiple Systems  The same approach used by the SRL to assure a coherent output is used with input produced by multiple systems.

Page 3 System Architecture Identify argument candidates  Pruning  Argument Identifier Binary classification Classify argument candidates  Argument Classifier Multi-class classification  Inference Use the estimated probability distribution given by the argument classifier, and Expressive structural and linguistic constraints. Infer the optimal global output – modeled as a constrained optimization problem

Page 4 Pruning [Xue&Palmer 2004] Significant errors due to PP attachment Consider PP as attached to both NP and VP DevelPrecRecF1 Gold Charniak

Page 5 Modified Pruning DevelPrecRecF1 Gold Charniak Charniak Modified heuristic

Page 6 Argument Identification Argument identifier is trained with a phrase-based classifier. Learning Algorithm – SNoW  A sparse network of linear classifiers Weight update: a regularized variation of the Winnow multiplicative update rule  When probability estimation is needed, we use softmax

Page 7 Argument Identification (Features) Parse tree structure from Collins & Charniak’s parsers Clauses, chunks and POS tags are from UPC processors

Page 8 Argument Classification Similar to argument identification, using SNoW as a multi-class classifier Classes also include NULL

Page 9 Inference Occasionally, the output of the argument classifier violates some constraints. The inference procedure [Punyakanok et al., 2004]  Input: the probability estimation (by the argument classifier), and structural and linguistic constraints  Output: the best legitimate global predictions  Formulated as an optimization problem and solved via Integer Linear Programming.  Allows incorporating expressive (non-sequential) constraints on the variables (the arguments types).

Page 10 Integer Linear Programming Inference For each argument a i  Set up a Boolean variable: a i,t indicating if a i is classified as t Goal is to maximize   i score(a i = t ) a i,t  Subject to the (linear) constraints Any Boolean constraints can be encoded this way. If score(a i = t ) = P(a i = t ), the objective is find the assignment that maximizes the expected number of arguments that are correct and satisfies the constraints

Page 11 Constraints No overlapping or embedding arguments  a i, a j overlap or embed: a i,NULL + a j,NULL  1

Page 12 Constraints  No overlapping or embedding arguments  No duplicate argument classes for A0-A5  Exactly one V argument per predicate  If there is a C-V, there must be V-A1-C-V pattern  If there is an R-arg, there must be arg somewhere  If there is a C-arg, there must be arg somewhere before  Each predicate can take only core arguments that appear in its frame file. More specifically, we check for only the minimum and maximum ids

Page 13 Results PrecRecF1 DevCollins Charniak WSJCollins Charniak BrownCollins Charniak

Page 14 Inference with Multiple Systems The performance of SRL heavily depends on the very first stage – pruning [IJCAI 2005]  which is derived directly from the full parse trees Joint Inference allows improvement over semantic role labeling classifiers  Combine different SRL systems through joint inference  Systems are derived using different full parse trees

Page 15 Inference with Multiple Systems Multiple Systems  Train and test with Collins’ parse outputs  Train with Charniak’ best parse outputs Test with 5-best Charniak’ parse outputs

Page 16..., traders say, unable to cool the selling panic in both stocks and futures. a1a1 a1a1 a4a4 b1b1 b3b3 b2b2 traders the selling panic in both stocks and futures NullA0A1A NullA0A1A NullA0A1A NullA0A1A Naïve Joint Inference NullA0A1A

Page 17 a1a1 a1a1 a4a4 a3a3 a2a2 b1b1 b3b3 b2b2 b4b4 NullA0A1A Joint Inference – Phantom Candidates Default Priors

Page 18 Results of Joint Inference

Page 19 Results of Joint Inference

Page 20 Results of Joint Inference

Page 21 Results of Different Combination

Page 22 Conclusion The ILP inference can naturally be extended to reason over multiple SRL systems.

Page 23 Thank You