Inference and Learning via Integer Linear Programming

Inference and Learning via Integer Linear Programming
Vasin,Dan,Scott,Dav

Outline Problem Definition Integer Linear Programming (ILP)
Its generality Learning and Inference via ILP Experiments Extension to hierarchical learning Future Direction Hidden Variables Place Inference task (finding assignment to variable set) as ILP -Cost Fcn – defined by set of learned classifiers -Constraints – maintain structure of solution Doing this 1. allow MANY constraints in inference 2. set up a framework where different learning methods can be used -- compare two natural algorithms - independent vs. global training Experiments -- independent training is better sometimes easy problems -- global better for difficult problems Extension when -- dependent tasks (one on the next) -- classification is done in levels Future Direction learn with hidden variables

Problem Definition X = (X1,...,Xk)  X k = X Y = (Y1,...,Yl)  Y l = Y
Given X = x, find Y = y Notation agreements Capital letters mean variables Non-capital letters mean values Bold indicates vectors or matrixes X,Y is a set

Example (Text Chunking)
y = NP ADJP VP ADVP VP x = The guy presenting now is so tired

Classifiers A classifier Example h: X Y (l-1)Y  {1,..,l} R
score(x,y-3,NP,3) = 0.3 score(x,y-3,VP,3) = 0.5 score(x,y-3,ADVP,3) = 0.2 score(x,y-3,ADJP,3) = 1.2 score(x,y-3,NULL,3) = 0.1

Inference Goal: x  y Given Find y x input
score(x,y-t,y,t) for all (y-t,y)  Y l, t  {1,..,l} C A set of constraints over Y Find y maximizes global function score(x,y) = t score(x,y-t,yt,t) satisfies constraints C

Integer Linear Programming
Boolean variables: U = (U1,...,Ud)  {0,1}d Cost vector: p = (p1,…,pd)  Rd Cost Function: pU Constraint Matrix: c  ReRd Maximize pU Subject to cU  0 (cU=0, cU3, possible)

ILP (Example) U = (U1,U2,U3) p = (0.3, 0.5, 0.8) c = 1 2 3 -1 -2 2
0 -3 2 Maximize pU Subject to cU  0

Boolean Functions as Linear Constraints
Conjunction U1U2U3  U1=1, U2=1, U3=1 Disjunction U1U2U3  U1 + U2 + U3  1 CNF (U1U2)(U3U4)  U1+U2  1, U3+U4  1

Text Chunking Indicator Variables Cost Vector
U1,NP,U1,NULL,U2,VP... y1=NP, y1=NULL, y2=VP,.. U1,NP indicates that phrase 1 is labeled NP Cost Vector p1,NP = score(x,NP,1) p1,NULL = score(x,NULL,1) p2,VP = score(x,VP,2) ... pU = score(x,y) = t score(x,yt,t), subject to constraints

Structural Constraints
Coherency yt can take only one value y{NP,..NULL} Ut,y = 1 Non-Overlapping y1 and y2 overlap U1,NULL + U2,NULL = 1

Linguistic Constraints
Every sentence must have at least one VP t Ut,VP  1 Every sentence must have at least one NP t Ut,NP  1 ...

Interacting Classifiers
Classifier for an output yt uses other outputs y-t as inputs score(x,y-t,y,t) Need to ensure that the final output from ILP computed from a consistent y Introduce additional variables Introduce additional coherency constraints

Interacting Classifiers
Additional variables Y=y  UY,y for all possible y-t,y Additional coherency constraints UY,y = 1 iff Ut,yt = 1 for all yt in y yt in y Ut,yt - UY,y  l – 1 yt in y Ut,yt - lUY,y  0

Learning Classifiers score(x,y-t,y,t) = yy(x,y-t,t)
Learn y, for all y  Y Multi-class learning Example (x,y)  {y(x,y-t,t),yt}t=1..l Learn each classifier independently

Learn with Inference Feedback
Learn by observing global behavior For each example (x,y) Make prediction with the current classifiers and ILP y’ = argmaxy t score(x,y-t,y,t) For each t, update If y’t  yt Promote score(x,y-t,yt,t) Demote score(x,y’-t,y’t,t)

Experiments Semantics Role Labeling
Assume correct boundaries are given Only sentences with more than 5 arguments are included

Experimental Results For difficult task: For easy task:
Winnow Perceptron For difficult task: Inference feedback during training improves performance For easy task: Learning without inference feedback is better

Conservative Updating
Update only if necessary Example U1 + U2 = 1 Predict (U1, U2) = (1,0) Correct (U1, U2) = (0,1) Feedback Demote class 1, promote class 2 So, U1=0  U2=1, so only demote class 1

Conservative Updating
S = minset(Constraints) Set of functions that, if changed, would make global prediction correct. Promote (Demote) only those functions in the minset S

Hierarchical Learning
Given x Compute hierarchically z1 = h1(x) z2 = h2(x,z1) … y = hs+1(x,z1,…,zs) Assume all z are known in training

Hierarchical Learning
Assume each hj can be computed via ILP pj, Uj, cj y = argmaxymaxz1,…zs jjpjUj Subject to c1U1  0, c2U2  0, …, cs+1Us+1  0 where j is a large enough constant to preserve hierarchy

Hidden Variables Given x y = h(x,z) z is not known in training
y = argmaxymaxz score(x,z,y-t,y,t) Subject to some constraints

Learning with Hidden Variables
Truncated EM styled learning For each example (x,y) Compute z with the current classifiers and ILP z = argmaxz score(x,z,y-t,y,t) Make prediction with the current classifiers and ILP (y’,z’) = argmaxy,z t score(x,z,y-t,y,t) For each t, update If y’t  yt Promote score(x,z,y-t,yt,t) Demote score(x,z’,y’-t,y’t,t)

Conclusion ILP is powerful general learnable useful
fast (or at least not too slow) extendable

Boolean Functions as Linear Constraints
Conjunction abc  Ua + Ub + Uc  3 Disjunction abc  Ua + Ub + Uc  1 DNF ab + cd  Iab + Icd  1 Introduce new variables Iab, Icd

Helper Variables We must link Ia, Ib, and Iab Iab ab IaIb  Iab
Iab  IaIb 2Iab <= Ia + Ib

Semantic Role Labeling
a,b,c... ph1=A0, ph1=A1,ph2=A0,.. Cost Vector pa = score(ph1=A0) pb = score(ph1=A1) ... Indicator Variables Ia indicates that phrase 1 is labeled A0 paIa = 0.3 if Ia and 0 ow

Learning X = (X1,...,Xk)  X1,…,Xk = X Y-t = (Y1,...,Yt-1,Yt+1,Yl)
 Y1,…,Yt-1,Yt+1,…,Yl = Y -t Yt  Yt Given X = x, and Y-t = y-t, find Yt = yt or score of each possible yt X Y –t  Yt or X Y –tYt R

SRL via Generalized Inference

Outline Find potential argument candidates Classify arguments to types
Inference for Argument Structure Integer linear programming (ILP) Cost Function Constraints Features We follow a now seemingly standard approach to SRL. Given a sentence, first we find a set of potential argument candidates by identifying which words are at the border of an argument. Then, once we have a set of potential arguments, we use a suite of classifiers to tell us how likely an argument is to be of each type. Finally, we use all of the information we have so far to find the assignment of types to argument that gives us the “optimal” global assignment. Similar approaches (with similar results) use inference procedures tied to their represntation. Instead, we use a general inference procedure by setting up the problem as a linear programming problem. This is really where our technique allows us to apply powerful information that similar approaches can not.

Find Potential Arguments
I left my nice pearls to her Every chunk can be an argument Restrict potential arguments BEGIN(word) BEGIN(word) = 1  “word begins argument” END(word) END(word) = 1  “word ends argument” Argument (wi,...,wj) is a potential argument iff BEGIN(wi) = 1 and END(wj) = 1 Reduce set of potential argments I left my nice pearls to her [ [ [ [ [ ] ] ] ] ]

Details... Learn a function
BEGIN(word) Learn a function B(word,context,structure)  {0,1} END(word) E(word,context,structure)  {0,1} POTARG = {arg | BEGIN(first(arg)) and END(last(arg))}

Arguments Type Likelihood
Assign type-likelihood How likely is it that arg a is type t? For all a  POTARG , t  T P (argument a = type t ) I left my nice pearls to her [ [ [ [ [ ] ] ] ] ] A0 CA1 A1 Ø

Details... Learn a classifier Estimate Probabilites
ARGTYPE(arg) P(arg)  {A0,A1,...,CA0,...,LOC,...} argmaxt{A0,A1,...,CA0,...,LOC,...} wt P(arg) Estimate Probabilites P(a = t) = wt P(a) / Z

What is a Good Assignment?
Likelihood of being correct P(Arg a = Type t) if t is the correct type for argument a For a set of arguments a1, a2, ..., an Expected number of arguments correct  i P( ai = ti ) We search for the assignment with maximum expected correct

I left my nice pearls to her I left my nice pearls to her
Inference Maximize expected number correct T* = argmaxT  i P( ai = ti ) Subject to some constraints Structural and Linguistic I left my nice pearls to her I left my nice pearls to her Cost = = 1.6 Non-Overlapping Cost = = 1.8 Independent Max Cost = = 1.4 BlueRed & N-O

Everything is Linear Cost function a  POTARG P(a=t) = a  POTARG , t  T P(a=t)Iat Constraints Non-Overlapping a and a’ overlap  IaØ + Ia’Ø = 0 Linguistic  CA0   A0  a IaCA0 – a IaA0  1 Integer Linear Programming

Features are Important
Here, a discussion of the features should go. Which are most important? Comparison to other people.

I left my nice pearls to her
[ [ [ [ [ ] ] ] ] ] I left my nice pearls to her I left my nice pearls to her

Inference and Learning via Integer Linear Programming

Similar presentations

Presentation on theme: "Inference and Learning via Integer Linear Programming"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Inference and Learning via Integer Linear Programming

Similar presentations

Presentation on theme: "Inference and Learning via Integer Linear Programming"— Presentation transcript:

Similar presentations

About project

Feedback