Download presentation

Presentation is loading. Please wait.

Published byGonzalo Hoult Modified over 2 years ago

1
CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng

2
INTRODUCTION

3
Main ideas: Idea 1: Modeling Separate modeling and problem formulation from algorithms Similar to the philosophy of probabilistic modeling Idea 2: Inference Keep model simple, make expressive decisions (via constraints) Unlike probabilistic modeling, where models become more expressive Inject background knowledge Idea 3: Learning Expressive structured decisions can be supported by simply learned models Global Inference can be used to amplify the simple models (and even minimal supervision).

4
Task of interest: Structured Prediction

5
Pipeline?

6
Model Formulation PenaltyViolation measure Regularization Local dependency e.g. HMM, CRF

7
Constraint expressivity Multiclass Problem: One v. All approximation: Ideal classification, can be expressed through constraints

8
Implementations ModelingObjective function Constrained Optimization Solver Integer Linear Programming InferenceExact ILP, Heurisitic Search, Relaxation, Dynamic Programming Learning

9
How do we use CCM to learn?

10
EXAMPLE 1: JOINT INFERENCE-BASED LEARNING Constrained HMM in Information Extraction

11
Typical work flow Define basic classifiers Define constraints as linear inequalities Combine the two into an objective function

12
HMM CCM Example

13
AUTHORLars Ole Andersen. Program analysis and TITLEspecialization for the EDITORC BOOKTITLEProgramming language TECH-REPORT. PhD thesis. INSTITUTIONDIKU, University of Copenhagen, May DATE1994. Violates a lot of natural constraints

14
HMM CCM Example Each field must be a consecutive list of words and can appear at most once in a citation. State transitions must occur on punctuation marks. The citation can only start with AUTHOR or EDITOR. The words pp., pages correspond to PAGE. Four digits starting with 20xx and 19xx are DATE. Quotations can appear only in TITLE

15
HMM CCM Example

16
New objective function involving constraints Penalize the probability of sequence if it violates constraint Penalty for each time the constraint is violated

17
HMM CCM Example Transform to linear model

18
HMM CCM Example

19
Simply counting the probability of the constraints being violated

20
HMM CCM Example

21
Are there other ways to learn? Can this paradigm be generalized?

22
TRAINING PARADIGMS

23
Training paradigms Decompose LearnInference

24
Prior knowledge: Features vs. Constraints FeatureConstraint Data dependentYesNo (if not learnt) LearnableYes SizeLargeSmall Improvement Approach Higher order modelPost-processing for I+L Domain Penalty typeSoftHard & Soft Common usageLocalGlobal Formulation

25
Comparison with MLN

26
Training paradigms

27
Which paradigm is better?

28
Algorithmic view of the differences IBT I+L

29
L+I vs. IBT tradeoffs # of Features In some cases problems are hard due to lack of training data. Semi-supervised learning

30
Choice of paradigm

31
PARADIGM 2: LEARNING + INFERENCE An example with Entity-Relation Extraction

32
Entity-Relation Extraction [RothYi07] Dole ’s wife, Elizabeth, is a native of N.C. E 1 E 2 E 3 R 12 R 23 1: 32 Decision time inference

33
Entity-Relation Extraction [RothYi07] Formulation 1: Joint Global Model Intractable to learn Need to decomposition

34
Entity-Relation Extraction [RothYi07] Formulation 2: Local learning + global inference

35
Entity-Relation Extraction [RothYi07] Cost function: c {E1 = per} · x {E1 = per} + c {E1 = loc} · x {E1 = loc} + … + c {R12 = spouse_of} · x {R12 = spouse_of} + … + c {R12 = } · x {R12 = } + … R 12 R 21 R 23 R 32 R 13 R 31 E1E1 Dole E2E2 Elizabeth E3E3 N.C.

36
Entity-Relation Extraction [RothYi07] Exactly one label for each relation and entity Relation and entity type constraints Integral constraints, in effect boolean

37
Entity-Relation Extraction [RothYi07] Each entity is either a person, organization or location: x {E1 = per} + x {E1 = loc} + x {E1 = org} + x {E1 = } =1 ( R 12 = spouse_of) ( E 1 = person) ( E 2 = person) x {R12 = spouse_of} x {E1 = per} x {R12 = spouse_of} x {E2 = per}

38
Entity-Relation Extraction [RothYi07] Entity classification results

39
Entity-Relation Extraction [RothYi07] Relation identification results

40
Entity-Relation Extraction [RothYi07] Relation identification results

41
INNER WORKINGS OF INFERENCE

42
Constraints Encoding

43
Integer Linear Programming (ILP) Powerful tool, very general NP-hard even in binary case, but efficient for most NLP problems If ILP can not solve the problem efficiently, we can fall back to approximate solutions using heuristic search

44
Integer Linear Programming (ILP)

46
SENTENCE COMPRESSION

47
Sentence Compression Example Modelling Compression with Discourse Constraints, James Clarke and Mirella Lapata, COLING/SCL 2006 1. What is sentence compression? Sentence compression is commonly expressed as a word deletion problem: given an input sentence of words W = w1,w2,...,wn, the aim is to produce a compression by removing any subset of these words (Knight and Marcu 2002).

48
A trigram language model: maximize a scoring function by ILP: p i: word i starts the compression q i,j : sequence wi,wj ends the compression X i,j,k : trigram wi, wj,wk in the compression Y i : word i in the compression Each p,q,x,y is either 0 or 1,

49
Sentential Constrains: 1. disallows the inclusion of modifiers without their head words: 2. presence of modifiers when the head is retained in the compression: 3. constrains that if a verb is present in the compression then so are its arguments:

50
Modifier Constraint Example

52
Sentential Constrains: 4. preserve personal pronouns in the compressed output:

53
Discourse Constrains: 1. Center of a sentence is retained in the compression, and the entity realised as the center in the following sentence is also retained. Center of the sentences is the entity with the highest rank. Entity may ranked by many features. EX: grammatical role (subjects > objects > others).

54
Discourse Constrains: 2. Lexical Chain Constrains: Lexical chain is a sequences of semantically related words. Often the longest lexical chain is the most important chain.

55
SEMANTIC ROLE LABELING

56
Semantic Role labeling Example: What is SRL? SRL identifies all constituents that fill a semantic role, and determines their roles.

57
General information: Both models(argument identifier and argument classifiers) are trained by SNoW. Idea: maximization the scoring function

58
SRL: Argument Identification use a learning scheme that utilizes two classifiers, one to predict the beginnings of possible arguments, and the other the ends. The predictions are combined to form argument candidates. Why: When only shallow parsing is available, the system does not have constituents to begin with. Therefore, conceptually, the system has to consider all possible subsequences.

59
SRL: List of features POS tags Length Verb class Head word and POS tag of the head word Position Path Chunk pattern Clause relative position Clause coverage NEG MOD

60
SRL: Constraints 1. Arguments cannot overlap with the predicate. 2. Arguments cannot exclusively overlap with the clauses. 3. If a predicate is outside a clause, its arguments cannot be embedded in that clause. 4. No overlapping or embedding arguments. 5. No duplicate argument classes for core arguments. Note: conjunction is an exception. [A0 I] [V left ] [A1 my pearls] [A2 to my daughter] and [A1 my gold] [A2 to my son].

61
SRL: Constraints 6. if an argument is a reference to some other argument arg, then this referenced argument must exist in the sentence. 7. If there is a C-arg argument, then there has to be an arg argument; in addition,the C-arg argument must occur after arg. the label C-arg is then used to specify the continuity of the arguments. 8. Given a specific verb, some argument types should never occur.

63
SRL Results:

64
QA Questions?

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google