Download presentation

Presentation is loading. Please wait.

1
**Semidefinite Programming Machines**

Thore Graepel and Ralf Herbrich Microsoft Research Cambridge Microsoft Research Ltd.

2
**Microsoft Research Ltd.**

Overview Invariant Pattern Recognition Semidefinite Programming (SDP) From Support Vector Machines (SVMs) to Semidefinite Programming Machines (SDPMs) Experimental Illustration Future Work Microsoft Research Ltd.

3
**Typical Invariances for Images**

Translation Shear Rotation Microsoft Research Ltd.

4
**Typical Invariances for Images**

Translation Shear Rotation This is for the poster! Microsoft Research Ltd.

5
**Toy Features for Handwritten Digits**

1 =0.48 2=0.58 3=0.37 Microsoft Research Ltd.

6
**Warning: Highly Non-Linear**

Á2 Á1 Microsoft Research Ltd.

7
**Warning: Highly Non-Linear**

0.2 0.3 0.4 0.5 0.6 0.25 0.35 0.45 0.55 f 1 2 For the poster Microsoft Research Ltd.

8
**Motivation: Classification Learning**

0.65 0.6 0.55 Can we learn with infinitely many examples? 0.5 ) 0.45 x ( 2 f 0.4 0.35 0.3 0.25 0.2 0.1 0.2 0.3 0.4 0.5 f ( x ) 1 Microsoft Research Ltd.

9
**Motivation: Classification Learning**

0.65 0.6 0.55 0.5 ) 0.45 x ( 2 f 0.4 0.35 0.3 0.25 0.2 0.1 0.2 0.3 0.4 0.5 f ( x ) 1 Microsoft Research Ltd.

10
**Motivation: Version Spaces**

Original patterns Transformed patterns Microsoft Research Ltd.

11
**Semidefinite Programs (SDPs)**

Linear objective function Positive semidefinite (psd) constraints Infinitely many linear constraints Microsoft Research Ltd.

12
**SVM as a Quadratic Program**

Given: A sample ((x1,y1),…,(xm,ym)). SVMs find the weight vector w that maximises the margin on the sample Microsoft Research Ltd.

13
**SVM as a Semidefinite Program (I)**

A (block)-diagonal matrix is psd if and only if all its blocks are psd. g1,j gi,j gm,j 1 Aj:= B:= Microsoft Research Ltd.

14
**SVM as a Semidefinite Program (I)**

A (block)-diagonal matrix is psd if and only if all its blocks are psd. g1,j gi,j gm,j 1 Aj:= B:= Microsoft Research Ltd.

15
**SVM as a Semidefinite Program (II)**

Transform quadratic into linear objective Use Schur’s complement lemma Adds new (n+1)£(n+1) block to Aj and B Microsoft Research Ltd.

16
**Taylor Approximation of Invariance**

Let T (x,µ) be an invariance transformation with parameter µ (e.g., angle of rotation). Taylor Expansion about 0=0 gives Polynomial approximation to trajectory. Microsoft Research Ltd.

17
**Extension to Polynomials**

Consider polynomial trajectory x(µ): Infinite number of constraints from training example (x(0),…, x(r),y): Microsoft Research Ltd.

18
**Non-Negative Polynomials (I)**

Theorem (Nesterov,2000): If r=2l then For every psd matrix P the polynomial p(µ)=µTP µ is non-negative everywhere. For every non-negative polynomial p there exists a psd matrix P such that p(µ)=µTPµ. Example: Microsoft Research Ltd.

19
**Non-Negative Polynomials (II)**

(1) follows directly from psd definition (2) follows from sum-of-squares lemma. Note that (2) states the mere existence: Polynomial of degree r: r+1 parameters Coefficient matrix P:(r+2) (r+4)/8 parameters For r >2, we have to introduce another r(r-2)/8 auxiliary variables to find P. Microsoft Research Ltd.

20
**Semidefinite Programming Machines**

Extension of SVMs as (non-trivial) SDP. G1,j g1,j 1 1 1 Aj:= Gi,j B:= 1 1 This was previously on the poster: Each trajectory (data point + transformation) is represented by an SDP constraint Gi: gi,j 1 Gm,j 1 1 gm,j 1 Microsoft Research Ltd.

21
**Semidefinite Programming Machines**

Extension of SVMs as (non-trivial) SDP. g1,j G1,j 1 Aj:= Gi,j B:= 1 1 This was previously on the poster: Each trajectory (data point + transformation) is represented by an SDP constraint Gi: gi,j 1 Gm,j 1 1 gm,j 1 Microsoft Research Ltd.

22
**Example: Second-Order SDPMs**

2nd order Taylor expansion: Resulting polynomial in µ: Set of constraint matrices: Microsoft Research Ltd.

23
**Example: Second-Order SDPMs**

2nd order Taylor expansion: Resulting polynomial in µ: Set of constraint matrices: Microsoft Research Ltd.

24
**Non-Negative on Segment**

Given a polynomial p of degree 2l, consider the polynomial -5 5 10 -10 q f( ) Note that q is a polynomial of degree 4l. If q is positive everywhere, then p is positive everywhere in [-¿,+¿]. Microsoft Research Ltd.

25
**Non-Negative on Segment**

-5 5 10 -10 q f( ) Microsoft Research Ltd.

26
**Truly Virtual Support Vectors**

Dual complementarity yields expansion: The truly virtual support vectors are linear combinations of derivatives: Microsoft Research Ltd.

27
**Truly Virtual Support Vectors**

0.22 0.2 “1” 0.18 0.16 0.14 0.12 “9” 0.1 0.08 0.06 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 Microsoft Research Ltd.

28
**Visualisation: USPS “1” vs. “9”**

¿ = 20º 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.06 0.08 0.12 0.14 0.16 0.18 0.22 Microsoft Research Ltd.

29
**Results: Experimental Setup**

All 45 USPS classification tasks (1-v-1). 20 training images; 250 test images. Rotation is applied to all training images with ¿ = 10º. All results are averaged over 50 random training sets. Compared to SVM and virtual SVM. Microsoft Research Ltd.

30
**Microsoft Research Ltd.**

Results: SDPM vs. SVM 0.05 0.1 0.15 0.2 0.02 0.04 0.06 0.08 0.12 0.14 0.16 0.18 SVM error SDPM error m = 20, tau = 10, artificially rotated before training, averaged over 50 random training sets, test set 250 large; all sets are balanced class-wise. The whole 45 o-v-o tasks of USPS. Microsoft Research Ltd.

31
**Results: SDPM vs. Virtual SVM**

0.02 0.04 0.06 0.08 0.1 0.12 0.14 VSVM error SDPM error m = 20, tau = 10, artificially rotated before training, averaged over 50 random training sets, test set 250 large; all sets are balanced class-wise. The whole 45 o-v-o tasks of USPS. Microsoft Research Ltd.

32
**Results: Curse of Dimensionality**

Microsoft Research Ltd.

33
**Results: Curse of Dimensionality**

1 parameter 2 parameters Microsoft Research Ltd.

34
**Extensions & Future Work**

Multiple parameters µ1, µ2,..., µD. (Efficient) adaptation to kernel space. Semidefinite Perceptrons (NIPS poster with A. Kharechko and J. Shawe-Taylor). Sparsification by efficiently finding the example x and transformation µ with maximal information (idea of Neil Lawrence). Expectation propagation for BPMs (idea of Tom Minka). Microsoft Research Ltd.

35
**Conclusions & Future Work**

Learning from infinitely many examples. Truly virtual support vectors xi(µi*). Multiple parameters µ1, µ2,..., µD. (Efficient) adaptation to kernel space. Semidefinite Perceptrons (NIPS poster with A. Kharechko and J. Shawe-Taylor). Microsoft Research Ltd.

Similar presentations

OK

Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.

Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google