A Linear Programming Formulation for Global Inference in Natural Language Tasks Dan RothWen-tau Yih Department of Computer Science University of Illinois.

Slides:



Advertisements
Similar presentations
Latent Variables Naman Agarwal Michael Nute May 1, 2013.
Advertisements

Dana Nau: Lecture slides for Automated Planning Licensed under the Creative Commons Attribution-NonCommercial-ShareAlike License:
CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng.
Page 1 SRL via Generalized Inference Vasin Punyakanok, Dan Roth, Wen-tau Yih, Dav Zimak, Yuancheng Tu Department of Computer Science University of Illinois.
Scene Labeling Using Beam Search Under Mutex Constraints ID: O-2B-6 Anirban Roy and Sinisa Todorovic Oregon State University 1.
Structured SVM Chen-Tse Tsai and Siddharth Gupta.
Machine learning continued Image source:
COMPUTER AIDED DIAGNOSIS: FEATURE SELECTION Prof. Yasser Mostafa Kadah –
Computational problems, algorithms, runtime, hardness
Logistics Course reviews Project report deadline: March 16 Poster session guidelines: – 2.5 minutes per poster (3 hrs / 55 minus overhead) – presentations.
Page 1 SRL via Generalized Inference Vasin Punyakanok, Dan Roth, Wen-tau Yih, Dav Zimak, Yuancheng Tu Department of Computer Science University of Illinois.
1 Optimisation Although Constraint Logic Programming is somehow focussed in constraint satisfaction (closer to a “logical” view), constraint optimisation.
Supply Chain Design Problem Tuukka Puranen Postgraduate Seminar in Information Technology Wednesday, March 26, 2009.
Page 1 Generalized Inference with Multiple Semantic Role Labeling Systems Peter Koomen, Vasin Punyakanok, Dan Roth, (Scott) Wen-tau Yih Department of Computer.
Recovering Articulated Object Models from 3D Range Data Dragomir Anguelov Daphne Koller Hoi-Cheung Pang Praveen Srinivasan Sebastian Thrun Computer Science.
Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Part-of-Speech Tagging and Chunking with Maximum Entropy Model Sandipan Dandapat.
Maximum Entropy Model LING 572 Fei Xia 02/07-02/09/06.
Seven Lectures on Statistical Parsing Christopher Manning LSA Linguistic Institute 2007 LSA 354 Lecture 7.
Optimization of Linear Problems: Linear Programming (LP) © 2011 Daniel Kirschen and University of Washington 1.
Standard EM/ Posterior Regularization (Ganchev et al, 10) E-step: M-step: argmax w E q log P (x, y; w) Hard EM/ Constraint driven-learning (Chang et al,
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
STRUCTURED PERCEPTRON Alice Lai and Shi Zhi. Presentation Outline Introduction to Structured Perceptron ILP-CRF Model Averaged Perceptron Latent Variable.
A Global Relaxation Labeling Approach to Coreference Resolution Coling 2010 Emili Sapena, Llu´ıs Padr´o and Jordi Turmo TALP Research Center Universitat.
Page 1 Relation Alignment for Textual Entailment Recognition Department of Computer Science University of Illinois at Urbana-Champaign Mark Sammons, V.G.Vinod.
Page 1 March 2009 Brigham Young University With thanks to: Collaborators: Ming-Wei Chang, Vasin Punyakanok, Lev Ratinov, Nick Rizzolo, Mark Sammons, Scott.
1 CS546: Machine Learning and Natural Language Preparation to the Term Project: - Dependency Parsing - Dependency Representation for Semantic Role Labeling.
Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.
Page 1 March 2009 EACL Constrained Conditional Models for Natural Language Processing Ming-Wei Chang, Lev Ratinov, Dan Roth Department of Computer Science.
Overview of Machine Learning for NLP Tasks: part II Named Entity Tagging: A Phrase-Level NLP Task.
Graphical models for part of speech tagging
Illinois-Coref: The UI System in the CoNLL-2012 Shared Task Kai-Wei Chang, Rajhans Samdani, Alla Rozovskaya, Mark Sammons, and Dan Roth Supported by ARL,
Aspect Guided Text Categorization with Unobserved Labels Dan Roth, Yuancheng Tu University of Illinois at Urbana-Champaign.
Open Information Extraction using Wikipedia
New Modeling Techniques for the Global Routing Problem Anthony Vannelli Department of Electrical and Computer Engineering University of Waterloo Waterloo,
June 2013 Inferning Workshop, ICML, Atlanta GA Amortized Integer Linear Programming Inference Dan Roth Department of Computer Science University of Illinois.
Page 1 Learning and Inference in Natural Language From Stand Alone Learning Tasks to Structured Representations Dan Roth Department of Computer Science.
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Maximum Entropy (ME) Maximum Entropy Markov Model (MEMM) Conditional Random Field (CRF)
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
Using Semantic Relations to Improve Passage Retrieval for Question Answering Tom Morton.
Inference Protocols for Coreference Resolution Kai-Wei Chang, Rajhans Samdani, Alla Rozovskaya, Nick Rizzolo, Mark Sammons, and Dan Roth This research.
Multi-core Structural SVM Training Kai-Wei Chang Department of Computer Science University of Illinois at Urbana-Champaign Joint Work With Vivek Srikumar.
Exploiting Background Knowledge for Relation Extraction Yee Seng Chan and Dan Roth University of Illinois at Urbana-Champaign 1.
CPS Computational problems, algorithms, runtime, hardness (a ridiculously brief introduction to theoretical computer science) Vincent Conitzer.
Global Inference via Linear Programming Formulation Presenter: Natalia Prytkova Tutor: Maximilian Dylla
A Fast Finite-state Relaxation Method for Enforcing Global Constraints on Sequence Decoding Roy Tromble & Jason Eisner Johns Hopkins University.
Solving and Analyzing Side-Chain Positioning Problems Using Linear and Integer Programming Carleton L. Kingsford, Bernard Chazelle and Mona Singh Bioinformatics.
A Two-Phase Linear programming Approach for Redundancy Problems by Yi-Chih HSIEH Department of Industrial Management National Huwei Institute of Technology.
Department of Computer Science The University of Texas at Austin USA Joint Entity and Relation Extraction using Card-Pyramid Parsing Rohit J. Kate Raymond.
Page 1 July 2008 ICML Workshop on Prior Knowledge for Text and Language Constraints as Prior Knowledge Ming-Wei Chang, Lev Ratinov, Dan Roth Department.
Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
Lecture 7: Constrained Conditional Models
Inference and Learning via Integer Linear Programming
Computational problems, algorithms, runtime, hardness
Integer Linear Programming Formulations in Natural Language Processing
Part 2 Applications of ILP Formulations in Natural Language Processing
Simone Paolo Ponzetto University of Heidelberg Massimo Poesio
By Dan Roth and Wen-tau Yih PowerPoint by: Reno Kriz CIS
(Entity and) Event Extraction CSCI-GA.2591
CIS 700 Advanced Machine Learning for NLP Inference Applications
Improving a Pipeline Architecture for Shallow Discourse Parsing
Generating Natural Answers by Incorporating Copying and Retrieving Mechanisms in Sequence-to-Sequence Learning Shizhu He, Cao liu, Kang Liu and Jun Zhao.
Margin-based Decomposed Amortized Inference
CPS 173 Computational problems, algorithms, runtime, hardness
CS246: Information Retrieval
Dan Roth Computer and Information Science University of Pennsylvania
Dan Roth Department of Computer Science
Presentation transcript:

A Linear Programming Formulation for Global Inference in Natural Language Tasks Dan RothWen-tau Yih Department of Computer Science University of Illinois at Urbana-Champaign

Page 2 Natural Language Understanding View of Solving NLP Problems POS Tagging ChunkingParsing Word Sense Disambiguation Named Entity Recognition Coreference Resolution Semantic Role Labeling

Page 3 Weaknesses of Pipeline Model Propagation of errors Bi-Directional interactions between stages kill “Oswald fired a high-powered rifle at JFK from the sixth floor of the building where he worked.” Occasionally, later stage problems are easier. Upstream mistakes will not be corrected.

Page 4 Global Inference with Classifiers Classifiers (for components) are trained or given in advance. There are constraints on classifiers’ labels (which may be known during training or only known during testing). The inference procedure attempts to make the best global assignment, given the local predictions and the constraints Entity Recognition Relation Extraction kill “Oswald fired a high-powered rifle at JFK from the sixth floor of the building where he worked.” Local prediction Final decision

Page 5 Ideal Inference Dole ’s wife, Elizabeth, is a native of N.C. E 1 E 2 E 3 R 12 R 23 other 0.05 per 0.85 loc 0.10 other 0.05 per 0.50 loc 0.45 other 0.10 per 0.60 loc 0.30 irrelevant 0.10 spouse_of 0.05 born_in 0.85 irrelevant 0.05 spouse_of 0.45 born_in 0.50 irrelevant 0.05 spouse_of 0.45 born_in 0.50 other 0.05 per 0.85 loc 0.10 other 0.10 per 0.60 loc 0.30 other 0.05 per 0.50 loc 0.45 irrelevant 0.05 spouse_of 0.45 born_in 0.50 irrelevant 0.10 spouse_of 0.05 born_in 0.85 other 0.05 per 0.50 loc 0.45

Page 6 Inference Procedure Inference with classifiers is not a new idea.  On sequential constraint structure: HMM, PMM, CRF [Lafferty et al.], CSCL [Punyakanok&Roth]  On general structure: Heuristic search Integer linear programming (ILP) formulation  General: works on non-sequential constraint structure  Flexible: can represent many types of constraints  Optimal: finds the optimal solution  Fast: commercial packages are able to solve it quickly

Page 7 Outline Case study – using ILP to solve: 1. Non-overlapping constraints Usually solved using dynamic programming on sequential constraint structure 2. Simultaneous entity/relation recognition ILP can go beyond sequential constraint structure Discussion – pros & cons Summary & Future Work

Page 8 Cost = = 1.6 Phrase Identification Problems Several NLP problems attempt to detect and classify phrases in sentences.  e.g., Named Entity Recognition, Shallow Parsing, Information Extraction, Semantic Role Labeling, etc. Given a sentence, classifiers (OC, or phrase) predict phrase candidates (which often overlap). NP: Noun Phrase  : Null (not a phrase) P1: P2: P3: P4: Cost = = 2.0

Page 9 LP Formulation – Linear Cost Indicator variables x {P1=NP}, x {P1=  }, …, x {P4=NP}, x {P4=  }  {0,1} Total Cost = c {P1=NP} · x {P1=NP} + c {P1=  } · x {P1=  } +… + c {P4=  } · x {P4=  } = 0.6 · x {P1=NP} · x {P1=  } + …+ 0.1 · x {P4=  } P1: P2: P3: P4:

Page 10 LP Formulation – Linear Constraints Subject to: Binary Constraints: x {P1=NP}, x {P1=  }, …, x {P4=NP}, x {P4=  }  {0,1}  x {Phrase_i = Class_j}  {0,1} Unique-label Constraints: x {P1=NP} + x {P1=  } =1; x {P2=NP} + x {P2=  } =1 x {P3=NP} + x {P3=  } =1; x {P4=NP} + x {P4=  } =1  i,  j x {Phrase_i = Class_j} = 1 Non-overlapping Constraints: x {P1=  } + x {P3=  }  1 x {P2=  } + x {P3=  } + x {P4=  }  2 P1: P2: P3: P4:

Page 11 LP Formulation Generate one integer linear program per sentence.

Page 12 Entity/Relation Recognition John was murdered at JFK after his assassin, Kevin… Identify: John was murdered at JFK after his assassin, Kevin … location person Kill (X, Y) Identify named entities Identify relations between entities Exploit mutual dependencies between named entities and relations to yield a coherent global prediction

Page 13 Problem Setting R 12 R 21 R 23 R 32 R 13 R 31 E1E1 John E2E2 JFK E3E3 Kevin 1.Entity boundaries are given. 2.The relation between each pair of entities is represented by a relation variable; most of them are null. 3.The goal is to assign labels to these E and R variables. Constraints: ( R 12 = kill)  ( E 1 = person)  ( E 2 = person) ( R 12 = headquarter)  ( E 1 = organization)  ( E 2 = location) …

Page 14 LP Formulation – Indicator Variables For each variable x {E1 = per}, x {E1 = loc}, …, x {R12 = kill}, x {R12 = born_in}, …, x {R12 =  }, …  {0,1} For each pair of variables on an edge x {R12 = kill, E1 = per}, x {R12 = kill, E1 = loc}, …, x {R12 = , E1 = per}, x {R12 = , E1 = loc}, …, x {R32 = , E2 = per}, x {R32 = , E2 = loc}, …  {0,1} R 12 R 21 R 23 R 32 R 13 R 31 E1E1 John E2E2 JFK E3E3 Kevin

Page 15 LP Formulation – Cost Function Assignment cost c {E1 = per} · x {E1 = per} + c {E1 = loc} · x {E1 = loc} + … + c {R12 = kill} · x {R12 = kill} + … + c {R12 =  } · x {R12 =  } + … R 12 R 21 R 23 R 32 R 13 R 31 E1E1 John E2E2 JFK E3E3 Kevin Constraint cost c { R12 = kill, E1 = per } · x {R12 = kill, E1 = per} + c { R12 = kill, E1 = loc } · x {R12 = kill, E1 = loc} + …+ c { R12 = , E1 = loc } · x {R12 = , E1 = loc} + … Costs are given by classifiers. -- 0 Total cost = Assignment cost + Constraint cost

Page 16 Node—Edge Consistency Constraints Binary Constraints Unique-label Constraints LP Formulation – Linear Constraints Subject to: Node—Edge Consistency Constraints R 12 R 21 R 23 R 32 R 13 R 31 E1E1 John E2E2 JFK E3E3 Kevin R 12 E1E1 x {R12=kill} = x {R12=kill,E1=Null} + x {R12=kill,E1=person} + x {R12=kill,E1=location} + x {R12=kill,E1=organization}

Page 17 LP Formulation Generate one integer linear program per sentence.

Page 18 Experiments – Data Methodology: 1,437 sentences from TREC data; 5,336 entities; 19,048 pairs of potential relations. RelationEntity1Entity2Example# Located-in Work-for OrgBased-in Live-in Kill Loc Per Org Per Loc Org Loc Per (New York, US) (Bill Gates, Microsoft) (HP, Palo Alto) (Bush, US) (Oswald, JFK)

Page 19 Experimental Results – F 1 Entity PredictionsRelation Predictions Improvement compared to the basic (w/o inference) and pipeline (entity  relation) models Quality of decisions is enhanced No “stupid mistakes” that violate global constraints

Page 20 Decision-time Constraint Constraints may be known only in decision time.  Question Answering: “Who killed JFK?”  Find “kill” relation in candidate sentences Find the arguments of the “kill” relation Suppose we know the given sentence has the “kill” relation x {R12=kill} +x {R21=kill} +x {R13=kill} +…  1

Page 21 Computational Issues Exhaustive search won’t work  Even with a small number of variables and classes, the solution space is intractable n =20, k =5, 5 20 = 95,367,431,640,625 Heuristic search algorithms (e.g., beam search)?  Do not guarantee optimal solutions  In practice, may not be faster than ILP

Page 22 Generality (1/2) Linear constraints can represent any Boolean function  More components can be put in this framework Who killed whom? (determine arguments of the Kill relation) Entity1=Entity3 (co-ref classifier) Subj-Verb-Object constraints Able to handle non-sequential constraint structure  E/R case has demonstrated this property

Page 23 Generality (2/2) Integer linear programming (ILP) is NP-hard. However, an ILP problem at this scale can be solved very quickly using commercial packages, such as CPLEX or Xpress-MP.  CPLEX is able to solve a linear programming problem of 13 million variables within 5 minutes.  Processing 20 sentences in a second for a named entity recognition task on P3-800MHz

Page 24 Current/Future Work Handle stochastic (soft) constraints  Example: If the relation is kill, the first argument is person with 0.95 probability, and organization with 0.05 probability. Incorporate inference at learning time  Along the lines of [Carreras & Marquez, NIPS-03]

Page 25 Thank you Another case study on Semantic Role Labeling will be given tomorrow!!