Presentation is loading. Please wait.

Presentation is loading. Please wait.

Max-Margin Markov Networks by Ben Taskar, Carlos Guestrin, and Daphne Koller Presented by Michael Cafarella CSE574 May 25, 2005.

Similar presentations


Presentation on theme: "Max-Margin Markov Networks by Ben Taskar, Carlos Guestrin, and Daphne Koller Presented by Michael Cafarella CSE574 May 25, 2005."— Presentation transcript:

1 Max-Margin Markov Networks by Ben Taskar, Carlos Guestrin, and Daphne Koller Presented by Michael Cafarella CSE574 May 25, 2005

2 Introduction Kernel methods (SVMs) and max- margin are terrific for classification No way to model structure, relations Graphical models (Markov networks) can capture complex structure Not trained for discrimination Maximum Margin Markov (M3) Networks capture advantages of both

3 Standard classification Want to learn a classification function: f(x,y) are the features (basis functions), w are weights y is a multi-label classification. The possible assignments, Y, is exponential in number of labels l So, can’t compute argmax, can’t even represent all the features

4 Probabilistic classification Graphical model defines P(Y|X). Select label argmax y P(y | x) Exploit sparseness in dependencies through model design. (e.g., OCR chars are independent given neighbors) We’ll use pairwise Markov network to model: Each pot-func is log sum of basis functions

5 M3NM3N For regular Markov networks, we train w to maximize likelihood or cond. likelihood For M 3 N, we’ll train w to maximize margin Main contribution of this paper is how to choose w accordingly

6 Choosing w With SVMs, choose w to maximize margin Where Constraints ensure Maximizing margin magnifies difference between value of true label and the best runner up

7 Multiple labels Structured problems have multiple labels, not a single classification We extend “margin” to scale with the number of mistaken labels. So we now have: Where:

8 Convert to optimization prob We can remove margin term to obtain a quadratic program: We have to add slack variables, because data might not be separable We can now reformulate the whole M 3 N learning problem as the following optimization task…

9 Grand formulation The primal: The dual: Note extra dual vars; have no effect on sol.

10 Unfortunately, not enough! Constraints in primal, and #vars in dual, are exponential in #labels, l Let’s interpret variables in dual as density function over y, conditional on x Dual objective is function of expectations; we need just node, edge marginals of dual vars to compute them Define marginal dual vars as:

11 Now reformulate the QP But first, a pause I can’t copy any more formulae. I’m sorry. It’s making me crazy. I just can’t. Please refer to the paper, section 4! OK, now back to work…

12 Now reformulate the QP (2) The duals vars must arise from a legal density. Or, they must be in the marginal polytope. See equation 9! That means we must enforce consistency between pairwise and singleton marginal vars See equation 10! If network is not a forest, those constraints aren’t enough Can triangulate and add new vars, constraints Or, approximate a relaxation of the polytope using belief prop

13 Experiment #1: Handwriting 6100 words, 8 chars long, 150 subjects Each char is 16x8 pixels Y is classified word, each Y i is one of the 26 letters LogReg and CRFs, train by max’ing cond likelihood of labels given features SVMs and M 3 N, train by margin maximization

14

15 Experiment #2: Hypertext The usual collective classification task Four CS departments. Each page is one of course, faculty, student, project, other Each page has web & anchor text, represented as binary feature vector Also has hyperlinks to other examples RMN trained to max CP of labels, given text & links SVM and M 3 N trained w/max-margin

16

17 Conclusions M 3 N seem to work great for discriminative tasks Nice to borrow theoretical results from SVMs Not much testing so far Future work should use more complicated models, problems Future presentations should be done in Latex, not Powerpoint


Download ppt "Max-Margin Markov Networks by Ben Taskar, Carlos Guestrin, and Daphne Koller Presented by Michael Cafarella CSE574 May 25, 2005."

Similar presentations


Ads by Google