# Semidefinite Programming Machines

## Presentation on theme: "Semidefinite Programming Machines"— Presentation transcript:

Semidefinite Programming Machines
Thore Graepel and Ralf Herbrich Microsoft Research Cambridge Microsoft Research Ltd.

Microsoft Research Ltd.
Overview Invariant Pattern Recognition Semidefinite Programming (SDP) From Support Vector Machines (SVMs) to Semidefinite Programming Machines (SDPMs) Experimental Illustration Future Work Microsoft Research Ltd.

Typical Invariances for Images
Translation Shear Rotation Microsoft Research Ltd.

Typical Invariances for Images
Translation Shear Rotation This is for the poster! Microsoft Research Ltd.

Toy Features for Handwritten Digits
1 =0.48 2=0.58 3=0.37 Microsoft Research Ltd.

Warning: Highly Non-Linear
Á2 Á1 Microsoft Research Ltd.

Warning: Highly Non-Linear
0.2 0.3 0.4 0.5 0.6 0.25 0.35 0.45 0.55 f 1 2 For the poster Microsoft Research Ltd.

Motivation: Classification Learning
0.65 0.6 0.55 Can we learn with infinitely many examples? 0.5 ) 0.45 x ( 2 f 0.4 0.35 0.3 0.25 0.2 0.1 0.2 0.3 0.4 0.5 f ( x ) 1 Microsoft Research Ltd.

Motivation: Classification Learning
0.65 0.6 0.55 0.5 ) 0.45 x ( 2 f 0.4 0.35 0.3 0.25 0.2 0.1 0.2 0.3 0.4 0.5 f ( x ) 1 Microsoft Research Ltd.

Motivation: Version Spaces
Original patterns Transformed patterns Microsoft Research Ltd.

Semidefinite Programs (SDPs)
Linear objective function Positive semidefinite (psd) constraints Infinitely many linear constraints Microsoft Research Ltd.

Given: A sample ((x1,y1),…,(xm,ym)). SVMs find the weight vector w that maximises the margin on the sample Microsoft Research Ltd.

SVM as a Semidefinite Program (I)
A (block)-diagonal matrix is psd if and only if all its blocks are psd. g1,j gi,j gm,j 1 Aj:= B:= Microsoft Research Ltd.

SVM as a Semidefinite Program (I)
A (block)-diagonal matrix is psd if and only if all its blocks are psd. g1,j gi,j gm,j 1 Aj:= B:= Microsoft Research Ltd.

SVM as a Semidefinite Program (II)
Transform quadratic into linear objective Use Schur’s complement lemma Adds new (n+1)£(n+1) block to Aj and B Microsoft Research Ltd.

Taylor Approximation of Invariance
Let T (x,µ) be an invariance transformation with parameter µ (e.g., angle of rotation). Taylor Expansion about 0=0 gives Polynomial approximation to trajectory. Microsoft Research Ltd.

Extension to Polynomials
Consider polynomial trajectory x(µ): Infinite number of constraints from training example (x(0),…, x(r),y): Microsoft Research Ltd.

Non-Negative Polynomials (I)
Theorem (Nesterov,2000): If r=2l then For every psd matrix P the polynomial p(µ)=µTP µ is non-negative everywhere. For every non-negative polynomial p there exists a psd matrix P such that p(µ)=µTPµ. Example: Microsoft Research Ltd.

Non-Negative Polynomials (II)
(1) follows directly from psd definition (2) follows from sum-of-squares lemma. Note that (2) states the mere existence: Polynomial of degree r: r+1 parameters Coefficient matrix P:(r+2) (r+4)/8 parameters For r >2, we have to introduce another r(r-2)/8 auxiliary variables to find P. Microsoft Research Ltd.

Semidefinite Programming Machines
Extension of SVMs as (non-trivial) SDP. G1,j g1,j 1 1 1 Aj:= Gi,j B:= 1 1 This was previously on the poster: Each trajectory (data point + transformation) is represented by an SDP constraint Gi: gi,j 1 Gm,j 1 1 gm,j 1 Microsoft Research Ltd.

Semidefinite Programming Machines
Extension of SVMs as (non-trivial) SDP. g1,j G1,j 1 Aj:= Gi,j B:= 1 1 This was previously on the poster: Each trajectory (data point + transformation) is represented by an SDP constraint Gi: gi,j 1 Gm,j 1 1 gm,j 1 Microsoft Research Ltd.

Example: Second-Order SDPMs
2nd order Taylor expansion: Resulting polynomial in µ: Set of constraint matrices: Microsoft Research Ltd.

Example: Second-Order SDPMs
2nd order Taylor expansion: Resulting polynomial in µ: Set of constraint matrices: Microsoft Research Ltd.

Non-Negative on Segment
Given a polynomial p of degree 2l, consider the polynomial -5 5 10 -10 q f( ) Note that q is a polynomial of degree 4l. If q is positive everywhere, then p is positive everywhere in [-¿,+¿]. Microsoft Research Ltd.

Non-Negative on Segment
-5 5 10 -10 q f( ) Microsoft Research Ltd.

Truly Virtual Support Vectors
Dual complementarity yields expansion: The truly virtual support vectors are linear combinations of derivatives: Microsoft Research Ltd.

Truly Virtual Support Vectors
0.22 0.2 “1” 0.18 0.16 0.14 0.12 “9” 0.1 0.08 0.06 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 Microsoft Research Ltd.

Visualisation: USPS “1” vs. “9”
¿ = 20º 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.06 0.08 0.12 0.14 0.16 0.18 0.22 Microsoft Research Ltd.

Results: Experimental Setup
All 45 USPS classification tasks (1-v-1). 20 training images; 250 test images. Rotation is applied to all training images with ¿ = 10º. All results are averaged over 50 random training sets. Compared to SVM and virtual SVM. Microsoft Research Ltd.

Microsoft Research Ltd.
Results: SDPM vs. SVM 0.05 0.1 0.15 0.2 0.02 0.04 0.06 0.08 0.12 0.14 0.16 0.18 SVM error SDPM error m = 20, tau = 10, artificially rotated before training, averaged over 50 random training sets, test set 250 large; all sets are balanced class-wise. The whole 45 o-v-o tasks of USPS. Microsoft Research Ltd.

Results: SDPM vs. Virtual SVM
0.02 0.04 0.06 0.08 0.1 0.12 0.14 VSVM error SDPM error m = 20, tau = 10, artificially rotated before training, averaged over 50 random training sets, test set 250 large; all sets are balanced class-wise. The whole 45 o-v-o tasks of USPS. Microsoft Research Ltd.

Results: Curse of Dimensionality
Microsoft Research Ltd.

Results: Curse of Dimensionality
1 parameter 2 parameters Microsoft Research Ltd.

Extensions & Future Work
Multiple parameters µ1, µ2,..., µD. (Efficient) adaptation to kernel space. Semidefinite Perceptrons (NIPS poster with A. Kharechko and J. Shawe-Taylor). Sparsification by efficiently finding the example x and transformation µ with maximal information (idea of Neil Lawrence). Expectation propagation for BPMs (idea of Tom Minka). Microsoft Research Ltd.

Conclusions & Future Work
Learning from infinitely many examples. Truly virtual support vectors xi(µi*). Multiple parameters µ1, µ2,..., µD. (Efficient) adaptation to kernel space. Semidefinite Perceptrons (NIPS poster with A. Kharechko and J. Shawe-Taylor). Microsoft Research Ltd.