CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng.

Slides:

Advertisements

Similar presentations

Latent Variables Naman Agarwal Michael Nute May 1, 2013.

Advertisements

Guiding Semi- Supervision with Constraint-Driven Learning Ming-Wei Chang,Lev Ratinow, Dan Roth.

1 University of Southern California Keep the Adversary Guessing: Agent Security by Policy Randomization Praveen Paruchuri University of Southern California.

Page 1 SRL via Generalized Inference Vasin Punyakanok, Dan Roth, Wen-tau Yih, Dav Zimak, Yuancheng Tu Department of Computer Science University of Illinois.

Sequence Classification: Chunking Shallow Processing Techniques for NLP Ling570 November 28, 2011.

Learning for Structured Prediction Overview of the Material TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A A A.

Structured SVM Chen-Tse Tsai and Siddharth Gupta.

Support Vector Machines

Machine learning continued Image source:

A Linear Programming Formulation for Global Inference in Natural Language Tasks Dan RothWen-tau Yih Department of Computer Science University of Illinois.

Max-Margin Matching for Semantic Role Labeling David Vickrey James Connor Daphne Koller Stanford University.

A Joint Model For Semantic Role Labeling Aria Haghighi, Kristina Toutanova, Christopher D. Manning Computer Science Department Stanford University.

Learning with Probabilistic Features for Improved Pipeline Models Razvan C. Bunescu Electrical Engineering and Computer Science Ohio University Athens,

Logistics Course reviews Project report deadline: March 16 Poster session guidelines: – 2.5 minutes per poster (3 hrs / 55 minus overhead) – presentations.

Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.

Relational Data Mining in Finance Haonan Zhang CFWin /04/2003.

SRL using complete syntactic analysis Mihai Surdeanu and Jordi Turmo TALP Research Center Universitat Politècnica de Catalunya.

Page 1 Generalized Inference with Multiple Semantic Role Labeling Systems Peter Koomen, Vasin Punyakanok, Dan Roth, (Scott) Wen-tau Yih Department of Computer.

Approaches to automatic summarization Lecture 5. Types of summaries Extracts – Sentences from the original document are displayed together to form a summary.

A transformation-based approach to argument labeling Derrick Higgins Educational Testing Service

Sequence labeling and beam search LING 572 Fei Xia 2/15/07.

STRUCTURED PERCEPTRON Alice Lai and Shi Zhi. Presentation Outline Introduction to Structured Perceptron ILP-CRF Model Averaged Perceptron Latent Variable.

Integer Linear Programming in NLP Constrained Conditional Models

Page 1 March 2009 Brigham Young University With thanks to: Collaborators: Ming-Wei Chang, Vasin Punyakanok, Lev Ratinov, Nick Rizzolo, Mark Sammons, Scott.

1 CS546: Machine Learning and Natural Language Preparation to the Term Project: - Dependency Parsing - Dependency Representation for Semantic Role Labeling.

Page 1 March 2009 EACL Constrained Conditional Models for Natural Language Processing Ming-Wei Chang, Lev Ratinov, Dan Roth Department of Computer Science.

Mining Optimal Decision Trees from Itemset Lattices Dr, Siegfried Nijssen Dr. Elisa Fromont KDD 2007.

Illinois-Coref: The UI System in the CoNLL-2012 Shared Task Kai-Wei Chang, Rajhans Samdani, Alla Rozovskaya, Mark Sammons, and Dan Roth Supported by ARL,

Ling 570 Day 17: Named Entity Recognition Chunking.

A Weakly-Supervised Approach to Argumentative Zoning of Scientific Documents Yufan Guo Anna Korhonen Thierry Poibeau 1 Review By: Pranjal Singh Paper.

INFERENCE Shalmoli Gupta, Yuzhe Wang. Outline Introduction Dual Decomposition Incremental ILP Amortizing Inference.

June 2013 Inferning Workshop, ICML, Atlanta GA Amortized Integer Linear Programming Inference Dan Roth Department of Computer Science University of Illinois.

CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov

Maximum Entropy (ME) Maximum Entropy Markov Model (MEMM) Conditional Random Field (CRF)

Multi-Relational Data Mining: An Introduction Joe Paulowskey.

Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.

Maximum Entropy Models and Feature Engineering CSCI-GA.2590 – Lecture 6B Ralph Grishman NYU.

Deep Learning for Efficient Discriminative Parsing Niranjan Balasubramanian September 2 nd, 2015 Slides based on Ronan Collobert’s Paper and video from.

Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,

Shallow Parsing for South Asian Languages -Himanshu Agrawal.

11 Project, Part 3. Outline Basics of supervised learning using Naïve Bayes (using a simpler example) Features for the project 2.

A Fast Finite-state Relaxation Method for Enforcing Global Constraints on Sequence Decoding Roy Tromble & Jason Eisner Johns Hopkins University.

Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)

Department of Computer Science The University of Texas at Austin USA Joint Entity and Relation Extraction using Card-Pyramid Parsing Rohit J. Kate Raymond.

Machine Learning Lecture 1: Intro + Decision Trees Moshe Koppel Slides adapted from Tom Mitchell and from Dan Roth.

Page 1 July 2008 ICML Workshop on Prior Knowledge for Text and Language Constraints as Prior Knowledge Ming-Wei Chang, Lev Ratinov, Dan Roth Department.

Page 1 January 2010 Saarland University, Germany. Constrained Conditional Models Learning and Inference for Natural Language Understanding Dan Roth Department.

Static model noOverlaps :: ArgumentCandidate[] candidates -> discrete[] types for (i : (0.. candidates.size() - 1)) for (j : (i candidates.size()

Conditional Random Fields & Table Extraction Dongfang Xu School of Information.

Page 1 June 2009 ILPNLP NAACL-HLT With thanks to: Collaborators: Ming-Wei Chang, Dan Goldwasser, Vasin Punyakanok, Lev Ratinov, Nick Rizzolo,

Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.

Lecture 7: Constrained Conditional Models

Maximum Entropy Models and Feature Engineering CSCI-GA.2591

Integer Linear Programming Formulations in Natural Language Processing

Part 2 Applications of ILP Formulations in Natural Language Processing

By Dan Roth and Wen-tau Yih PowerPoint by: Reno Kriz CIS

Natural Language Processing (NLP)

Kai-Wei Chang University of Virginia

CIS 700 Advanced Machine Learning for NLP Inference Applications

Improving a Pipeline Architecture for Shallow Discourse Parsing

CSC 594 Topics in AI – Natural Language Processing

Lei Sha, Jing Liu, Chin-Yew Lin, Sujian Li, Baobao Chang, Zhifang Sui

Design of Hierarchical Classifiers for Efficient and Accurate Pattern Classification M N S S K Pavan Kumar Advisor : Dr. C. V. Jawahar.

Overview of Machine Learning

CS246: Information Retrieval

Natural Language Processing (NLP)

Dan Roth Computer and Information Science University of Pennsylvania

Dan Roth Department of Computer Science

Natural Language Processing (NLP)

Presentation transcript:

CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng

INTRODUCTION

Main ideas: Idea 1: Modeling Separate modeling and problem formulation from algorithms Similar to the philosophy of probabilistic modeling Idea 2: Inference Keep model simple, make expressive decisions (via constraints) Unlike probabilistic modeling, where models become more expressive Inject background knowledge Idea 3: Learning Expressive structured decisions can be supported by simply learned models Global Inference can be used to amplify the simple models (and even minimal supervision).

Task of interest: Structured Prediction

Pipeline?

Model Formulation PenaltyViolation measure Regularization Local dependency e.g. HMM, CRF

Constraint expressivity Multiclass Problem: One v. All approximation: Ideal classification, can be expressed through constraints

Implementations ModelingObjective function Constrained Optimization Solver Integer Linear Programming InferenceExact ILP, Heurisitic Search, Relaxation, Dynamic Programming Learning

How do we use CCM to learn?

EXAMPLE 1: JOINT INFERENCE-BASED LEARNING Constrained HMM in Information Extraction

Typical work flow Define basic classifiers Define constraints as linear inequalities Combine the two into an objective function

HMM CCM Example

AUTHORLars Ole Andersen. Program analysis and TITLEspecialization for the EDITORC BOOKTITLEProgramming language TECH-REPORT. PhD thesis. INSTITUTIONDIKU, University of Copenhagen, May DATE1994. Violates a lot of natural constraints

HMM CCM Example Each field must be a consecutive list of words and can appear at most once in a citation. State transitions must occur on punctuation marks. The citation can only start with AUTHOR or EDITOR. The words pp., pages correspond to PAGE. Four digits starting with 20xx and 19xx are DATE. Quotations can appear only in TITLE

HMM CCM Example

New objective function involving constraints Penalize the probability of sequence if it violates constraint Penalty for each time the constraint is violated

HMM CCM Example Transform to linear model

HMM CCM Example

Simply counting the probability of the constraints being violated

HMM CCM Example

Are there other ways to learn? Can this paradigm be generalized?

TRAINING PARADIGMS

Training paradigms Decompose LearnInference

Prior knowledge: Features vs. Constraints FeatureConstraint Data dependentYesNo (if not learnt) LearnableYes SizeLargeSmall Improvement Approach Higher order modelPost-processing for I+L Domain Penalty typeSoftHard & Soft Common usageLocalGlobal Formulation

Comparison with MLN

Training paradigms

Which paradigm is better?

Algorithmic view of the differences IBT I+L

L+I vs. IBT tradeoffs # of Features In some cases problems are hard due to lack of training data. Semi-supervised learning

Choice of paradigm

PARADIGM 2: LEARNING + INFERENCE An example with Entity-Relation Extraction

Entity-Relation Extraction [RothYi07] Dole ’s wife, Elizabeth, is a native of N.C. E 1 E 2 E 3 R 12 R 23 1: 32 Decision time inference

Entity-Relation Extraction [RothYi07] Formulation 1: Joint Global Model Intractable to learn Need to decomposition

Entity-Relation Extraction [RothYi07] Formulation 2: Local learning + global inference

Entity-Relation Extraction [RothYi07] Cost function: c {E1 = per} · x {E1 = per} + c {E1 = loc} · x {E1 = loc} + … + c {R12 = spouse_of} · x {R12 = spouse_of} + … + c {R12 =  } · x {R12 =  } + … R 12 R 21 R 23 R 32 R 13 R 31 E1E1 Dole E2E2 Elizabeth E3E3 N.C.

Entity-Relation Extraction [RothYi07] Exactly one label for each relation and entity Relation and entity type constraints Integral constraints, in effect boolean

Entity-Relation Extraction [RothYi07] Each entity is either a person, organization or location: x {E1 = per} + x {E1 = loc} + x {E1 = org} + x {E1 =  } =1 ( R 12 = spouse_of)  ( E 1 = person)  ( E 2 = person) x {R12 = spouse_of}  x {E1 = per} x {R12 = spouse_of}  x {E2 = per}

Entity-Relation Extraction [RothYi07] Entity classification results

Entity-Relation Extraction [RothYi07] Relation identification results

Entity-Relation Extraction [RothYi07] Relation identification results

INNER WORKINGS OF INFERENCE

Constraints Encoding

Integer Linear Programming (ILP) Powerful tool, very general NP-hard even in binary case, but efficient for most NLP problems If ILP can not solve the problem efficiently, we can fall back to approximate solutions using heuristic search

Integer Linear Programming (ILP)

SENTENCE COMPRESSION

Sentence Compression Example Modelling Compression with Discourse Constraints, James Clarke and Mirella Lapata, COLING/SCL What is sentence compression? Sentence compression is commonly expressed as a word deletion problem: given an input sentence of words W = w1,w2,...,wn, the aim is to produce a compression by removing any subset of these words (Knight and Marcu 2002).

A trigram language model: maximize a scoring function by ILP: p i: word i starts the compression q i,j : sequence wi,wj ends the compression X i,j,k : trigram wi, wj,wk in the compression Y i : word i in the compression Each p,q,x,y is either 0 or 1,

Sentential Constrains: 1. disallows the inclusion of modifiers without their head words: 2. presence of modifiers when the head is retained in the compression: 3. constrains that if a verb is present in the compression then so are its arguments:

Modifier Constraint Example

Sentential Constrains: 4. preserve personal pronouns in the compressed output:

Discourse Constrains: 1. Center of a sentence is retained in the compression, and the entity realised as the center in the following sentence is also retained. Center of the sentences is the entity with the highest rank. Entity may ranked by many features. EX: grammatical role (subjects > objects > others).

Discourse Constrains: 2. Lexical Chain Constrains: Lexical chain is a sequences of semantically related words. Often the longest lexical chain is the most important chain.

SEMANTIC ROLE LABELING

Semantic Role labeling Example: What is SRL? SRL identifies all constituents that fill a semantic role, and determines their roles.

General information: Both models(argument identifier and argument classifiers) are trained by SNoW. Idea: maximization the scoring function

SRL: Argument Identification use a learning scheme that utilizes two classifiers, one to predict the beginnings of possible arguments, and the other the ends. The predictions are combined to form argument candidates. Why: When only shallow parsing is available, the system does not have constituents to begin with. Therefore, conceptually, the system has to consider all possible subsequences.

SRL: List of features POS tags Length Verb class Head word and POS tag of the head word Position Path Chunk pattern Clause relative position Clause coverage NEG MOD

SRL: Constraints 1. Arguments cannot overlap with the predicate. 2. Arguments cannot exclusively overlap with the clauses. 3. If a predicate is outside a clause, its arguments cannot be embedded in that clause. 4. No overlapping or embedding arguments. 5. No duplicate argument classes for core arguments. Note: conjunction is an exception. [A0 I] [V left ] [A1 my pearls] [A2 to my daughter] and [A1 my gold] [A2 to my son].

SRL: Constraints 6. if an argument is a reference to some other argument arg, then this referenced argument must exist in the sentence. 7. If there is a C-arg argument, then there has to be an arg argument; in addition,the C-arg argument must occur after arg. the label C-arg is then used to specify the continuity of the arguments. 8. Given a specific verb, some argument types should never occur.

SRL Results:

QA Questions?