Structured SVM Chen-Tse Tsai and Siddharth Gupta.

Slides:



Advertisements
Similar presentations
A Support Vector Method for Optimizing Average Precision
Advertisements

Latent Variables Naman Agarwal Michael Nute May 1, 2013.
Introduction to Support Vector Machines (SVM)
Lecture 9 Support Vector Machines
ECG Signal processing (2)
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
An Introduction of Support Vector Machine
Online Max-Margin Weight Learning for Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science.
Efficient Large-Scale Structured Learning

Learning for Structured Prediction Overview of the Material TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A A A.
Machine Learning & Data Mining CS/CNS/EE 155 Lecture 8: Structural SVMs Part 2 & General Structured Prediction 1.
Maximum Margin Markov Network Ben Taskar, Carlos Guestrin Daphne Koller 2004.
Pattern Recognition and Machine Learning
An Introduction of Support Vector Machine
Support Vector Machines
SVM—Support Vector Machines
Machine learning continued Image source:
Supervised Learning Recap
Discriminative, Unsupervised, Convex Learning Dale Schuurmans Department of Computing Science University of Alberta MITACS Workshop, August 26, 2005.
Learning Structural SVMs with Latent Variables Xionghao Liu.
An Introduction to Structural SVMs and its Application to Information Retrieval Yisong Yue Carnegie Mellon University.
Jun Zhu Dept. of Comp. Sci. & Tech., Tsinghua University This work was done when I was a visiting researcher at CMU. Joint.
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
Support Vector Machines and Kernel Methods
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Support Vector Machines for Multiple- Instance Learning Authors: Andrews, S.; Tsochantaridis, I. & Hofmann, T. (Advances in Neural Information Processing.
Page 1 Generalized Inference with Multiple Semantic Role Labeling Systems Peter Koomen, Vasin Punyakanok, Dan Roth, (Scott) Wen-tau Yih Department of Computer.
Support Vector Machines
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Online Learning Algorithms
An Introduction to Support Vector Machines Martin Law.
Conditional Random Fields Rahul Gupta (KReSIT, IIT Bombay)
Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
Dual Coordinate Descent Algorithms for Efficient Large Margin Structured Prediction Ming-Wei Chang and Scott Wen-tau Yih Microsoft Research 1.
“Study on Parallel SVM Based on MapReduce” Kuei-Ti Lu 03/12/2015.
Jifeng Dai 2011/09/27.  Introduction  Structural SVM  Kernel Design  Segmentation and parameter learning  Object Feature Descriptors  Experimental.
Ranking with High-Order and Missing Information M. Pawan Kumar Ecole Centrale Paris Aseem BehlPuneet KumarPritish MohapatraC. V. Jawahar.
计算机学院 计算感知 Support Vector Machines. 2 University of Texas at Austin Machine Learning Group 计算感知 计算机学院 Perceptron Revisited: Linear Separators Binary classification.
Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
An Introduction to Support Vector Machines (M. Law)
Max-Margin Classification of Data with Absent Features Presented by Chunping Wang Machine Learning Group, Duke University July 3, 2008 by Chechik, Heitz,
Structured Output Prediction with Structural Support Vector Machines
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
Max-Margin Markov Networks by Ben Taskar, Carlos Guestrin, and Daphne Koller Presented by Michael Cafarella CSE574 May 25, 2005.
GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Lecture 2: Statistical learning primer for biologists
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
Text Classification using Support Vector Machine Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Concept-Based Analysis of Scientific Literature Chen-Tse Tsai, Gourab Kundu, Dan Roth UIUC.
Learning by Loss Minimization. Machine learning: Learn a Function from Examples Function: Examples: – Supervised: – Unsupervised: – Semisuprvised:
Loss-based Learning with Weak Supervision M. Pawan Kumar.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.
Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)
Support Vector Machines
PREDICT 422: Practical Machine Learning
Learning Deep Generative Models by Ruslan Salakhutdinov
Lecture 07: Soft-margin SVM
Support Vector Machines
An Introduction to Support Vector Machines
Lecture 07: Soft-margin SVM
Lecture 08: Soft-margin SVM
Lecture 07: Soft-margin SVM
Primal Sparse Max-Margin Markov Networks
Presentation transcript:

Structured SVM Chen-Tse Tsai and Siddharth Gupta

Outline  Introduction to SVM  Large Margin Methods for Structured and Interdependent Output Variables (Tsochantaridis et. al., 2005)  Max-Margin Markov Networks (Taskar et. al., 2003)  Learning Structural SVMs with Latent Variables (Yu and Joachims, 2009) 2

SVM- The main idea

Maximum margin Find w and b such that is maximized and for all (x i, y i ), i=1..n : y i (w T x i + b) ≥ 1 Find w and b such that Φ(w) = ||w|| 2 =w T w is minimized and for all (x i, y i ), i=1..n : y i (w T x i + b) ≥ 1 quadratic optimization problem r ρ

Binary SVM  Training examples:  Primal form:  Dual form:

Multiclass SVM

Structured Output  Approach: view as multi-class classification task  Every complex output is one class  Problems:  Exponentially many classes  How to predict efficiently?  How to learn efficiently?  Potentially huge model  Manageable number of features? The dog chased the cat x S VPNP DetNV NP DetN y2y2 S VP DetNV NP VN y1y1 S VP DetNV NP DetN ykyk … 7

Multi-Class SVM (Crammer & Singer, 2001)  Training Examples:  Inference:  Training: Find that solve 8

Multi-Class SVM (Crammer & Singer, 2001) The dog chased the cat x S VPNP DetNV NP DetN y1y1 S VP DetNV NP VN y2y2 S VP NP y 58 S VPNP DetNV NP DetN y 12 S VPNP DetNV NP DetN y 34 S VPNP DetNV NP DetN y4y4 9

Joint Feature Map  Problem: exponential number of parameters  Feature vector that describes match between x and y  Learn single weight vector. Inference The dog chased the cat x S VPNP DetNV NP DetN y1y1 S VP DetNV NP VN y2y2 S VP DetNV NP DetN y 58 S VPNP DetNV NP DetN y 12 S VPNP DetNV NP DetN y 34 S VPNP DetNV NP DetN y4y4 10

Joint Feature Map for Trees  Weighted Context Free Grammar  Each rule has a weight  Score of a tree is the sum of its weight  Find highest scoring tree  Using CKY Parser The dog chased the cat S VPNP DetNV NP DetN Thecatthechaseddog x y 11

Structured SVM  Hard margin … 12

Structured SVM  Soft Margin  SVM 1   SVM 2 13

General Loss Function  measures the difference between prediction y, and the true value y i. The y with high loss should be penalized more severely.  Slack re-scaling  Margin re-scaling 14

A Cutting Plane Algorithm  Only polynomial number of constraints are needed. 15

A Cutting Plane Algorithm  Cutting plane algorithm 16

Computational problem  Prediction:  Get the most violated constraint:  Approximate inference methods in MRF  Training Structural SVMs when Exact Inference is Intractable. T. Finley, T. Joachims, ICML

Outline  Large Margin Methods for Structured and Interdependent Output Variables (Tsochantaridis et. al., 2005)  Max-Margin Markov Networks (Taskar et. al., 2003)  Learning Structural SVMs with Latent Variables (Yu and Joachims, 2009) 18

Max-Margin Markov Network  Structured SVM entails a large number of constraints  So far, handled by adding one constraint a time  M 3 network  A way to solve SVM 1 with margin re-scaling  Use Markov network to encode dependency and generate features  Reduce exponential to polynomial number of constraints. 19

M 3 Network  A way to generate features.  Define features on the edges  The k-th feature of this instance  The loss function 20

M 3 Network  A way to solve SVM 1 with margin re-scaling  Primal:  Dual:  Only need node and edge marginal probability to compute expectation 21

Polynomial-Size Reformulation  The key step 22 y0y0 y1y1 y2y2 Δt x (y)α x (y) All possible y Gold y101 µ x (0) µ x (1)0.40.5

Polynomial-Size Reformulation  The key step  Marginal dual variables  New constraints  Tree structure: 23

Polynomial-Size Reformulation  Factored dual QP  #variables and #constraints: N2 M down to N(M 2 +M) N: number of instances, M: the length of y  Problem  If the structure is not simple, we may need exponential number of new constraints  Enforce only local consistency of marginals, get an approximate result 24

SMO  Sequential minimal optimization  In binary SVM, we have a linear constraint  Working set selection: select the two variables to update  M 3 net: 25

Experimental Results  Max-Margin Parsing (Taskar et. al, 2004)  Apply M 3 Net to parsing  Discussed how to extract features from a grammar 26

Outline  Large Margin Methods for Structured and Interdependent Output Variables (Tsochantaridis et. al., 2005)  Max-Margin Markov Networks (Taskar et. al., 2003)  Learning Structural SVMs with Latent Variables (Yu and Joachims, 2009) 27

Latent Variable Models  Widely used in machine learning and statistics  Unobserved quantities/missing data in experiments  Dimensionality Reduction  Classical examples: Mixture models, PCA, LDA  This paper: Latent variables in supervised prediction tasks

Latent Variables in S-SVMs  How can we extend structural SVM to handle latent variables?

Structured SVM

Latent S-SVM Formulation

CCCP Algorithm

aa

Noun Phrase Co-reference

Noun phrase co-reference results

38