Download presentation

Presentation is loading. Please wait.

1
**Semidefinite Programming Machines**

Thore Graepel and Ralf Herbrich Microsoft Research Cambridge Microsoft Research Ltd.

2
**Microsoft Research Ltd.**

Overview Invariant Pattern Recognition Semidefinite Programming (SDP) From Support Vector Machines (SVMs) to Semidefinite Programming Machines (SDPMs) Experimental Illustration Future Work Microsoft Research Ltd.

3
**Typical Invariances for Images**

Translation Shear Rotation Microsoft Research Ltd.

4
**Typical Invariances for Images**

Translation Shear Rotation This is for the poster! Microsoft Research Ltd.

5
**Toy Features for Handwritten Digits**

1 =0.48 2=0.58 3=0.37 Microsoft Research Ltd.

6
**Warning: Highly Non-Linear**

Á2 Á1 Microsoft Research Ltd.

7
**Warning: Highly Non-Linear**

0.2 0.3 0.4 0.5 0.6 0.25 0.35 0.45 0.55 f 1 2 For the poster Microsoft Research Ltd.

8
**Motivation: Classification Learning**

0.65 0.6 0.55 Can we learn with infinitely many examples? 0.5 ) 0.45 x ( 2 f 0.4 0.35 0.3 0.25 0.2 0.1 0.2 0.3 0.4 0.5 f ( x ) 1 Microsoft Research Ltd.

9
**Motivation: Classification Learning**

0.65 0.6 0.55 0.5 ) 0.45 x ( 2 f 0.4 0.35 0.3 0.25 0.2 0.1 0.2 0.3 0.4 0.5 f ( x ) 1 Microsoft Research Ltd.

10
**Motivation: Version Spaces**

Original patterns Transformed patterns Microsoft Research Ltd.

11
**Semidefinite Programs (SDPs)**

Linear objective function Positive semidefinite (psd) constraints Infinitely many linear constraints Microsoft Research Ltd.

12
**SVM as a Quadratic Program**

Given: A sample ((x1,y1),…,(xm,ym)). SVMs find the weight vector w that maximises the margin on the sample Microsoft Research Ltd.

13
**SVM as a Semidefinite Program (I)**

A (block)-diagonal matrix is psd if and only if all its blocks are psd. g1,j gi,j gm,j 1 Aj:= B:= Microsoft Research Ltd.

14
**SVM as a Semidefinite Program (I)**

A (block)-diagonal matrix is psd if and only if all its blocks are psd. g1,j gi,j gm,j 1 Aj:= B:= Microsoft Research Ltd.

15
**SVM as a Semidefinite Program (II)**

Transform quadratic into linear objective Use Schur’s complement lemma Adds new (n+1)£(n+1) block to Aj and B Microsoft Research Ltd.

16
**Taylor Approximation of Invariance**

Let T (x,µ) be an invariance transformation with parameter µ (e.g., angle of rotation). Taylor Expansion about 0=0 gives Polynomial approximation to trajectory. Microsoft Research Ltd.

17
**Extension to Polynomials**

Consider polynomial trajectory x(µ): Infinite number of constraints from training example (x(0),…, x(r),y): Microsoft Research Ltd.

18
**Non-Negative Polynomials (I)**

Theorem (Nesterov,2000): If r=2l then For every psd matrix P the polynomial p(µ)=µTP µ is non-negative everywhere. For every non-negative polynomial p there exists a psd matrix P such that p(µ)=µTPµ. Example: Microsoft Research Ltd.

19
**Non-Negative Polynomials (II)**

(1) follows directly from psd definition (2) follows from sum-of-squares lemma. Note that (2) states the mere existence: Polynomial of degree r: r+1 parameters Coefficient matrix P:(r+2) (r+4)/8 parameters For r >2, we have to introduce another r(r-2)/8 auxiliary variables to find P. Microsoft Research Ltd.

20
**Semidefinite Programming Machines**

Extension of SVMs as (non-trivial) SDP. G1,j g1,j 1 1 1 Aj:= Gi,j B:= 1 1 This was previously on the poster: Each trajectory (data point + transformation) is represented by an SDP constraint Gi: gi,j 1 Gm,j 1 1 gm,j 1 Microsoft Research Ltd.

21
**Semidefinite Programming Machines**

Extension of SVMs as (non-trivial) SDP. g1,j G1,j 1 Aj:= Gi,j B:= 1 1 This was previously on the poster: Each trajectory (data point + transformation) is represented by an SDP constraint Gi: gi,j 1 Gm,j 1 1 gm,j 1 Microsoft Research Ltd.

22
**Example: Second-Order SDPMs**

2nd order Taylor expansion: Resulting polynomial in µ: Set of constraint matrices: Microsoft Research Ltd.

23
**Example: Second-Order SDPMs**

2nd order Taylor expansion: Resulting polynomial in µ: Set of constraint matrices: Microsoft Research Ltd.

24
**Non-Negative on Segment**

Given a polynomial p of degree 2l, consider the polynomial -5 5 10 -10 q f( ) Note that q is a polynomial of degree 4l. If q is positive everywhere, then p is positive everywhere in [-¿,+¿]. Microsoft Research Ltd.

25
**Non-Negative on Segment**

-5 5 10 -10 q f( ) Microsoft Research Ltd.

26
**Truly Virtual Support Vectors**

Dual complementarity yields expansion: The truly virtual support vectors are linear combinations of derivatives: Microsoft Research Ltd.

27
**Truly Virtual Support Vectors**

0.22 0.2 “1” 0.18 0.16 0.14 0.12 “9” 0.1 0.08 0.06 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 Microsoft Research Ltd.

28
**Visualisation: USPS “1” vs. “9”**

¿ = 20º 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.06 0.08 0.12 0.14 0.16 0.18 0.22 Microsoft Research Ltd.

29
**Results: Experimental Setup**

All 45 USPS classification tasks (1-v-1). 20 training images; 250 test images. Rotation is applied to all training images with ¿ = 10º. All results are averaged over 50 random training sets. Compared to SVM and virtual SVM. Microsoft Research Ltd.

30
**Microsoft Research Ltd.**

Results: SDPM vs. SVM 0.05 0.1 0.15 0.2 0.02 0.04 0.06 0.08 0.12 0.14 0.16 0.18 SVM error SDPM error m = 20, tau = 10, artificially rotated before training, averaged over 50 random training sets, test set 250 large; all sets are balanced class-wise. The whole 45 o-v-o tasks of USPS. Microsoft Research Ltd.

31
**Results: SDPM vs. Virtual SVM**

0.02 0.04 0.06 0.08 0.1 0.12 0.14 VSVM error SDPM error m = 20, tau = 10, artificially rotated before training, averaged over 50 random training sets, test set 250 large; all sets are balanced class-wise. The whole 45 o-v-o tasks of USPS. Microsoft Research Ltd.

32
**Results: Curse of Dimensionality**

Microsoft Research Ltd.

33
**Results: Curse of Dimensionality**

1 parameter 2 parameters Microsoft Research Ltd.

34
**Extensions & Future Work**

Multiple parameters µ1, µ2,..., µD. (Efficient) adaptation to kernel space. Semidefinite Perceptrons (NIPS poster with A. Kharechko and J. Shawe-Taylor). Sparsification by efficiently finding the example x and transformation µ with maximal information (idea of Neil Lawrence). Expectation propagation for BPMs (idea of Tom Minka). Microsoft Research Ltd.

35
**Conclusions & Future Work**

Learning from infinitely many examples. Truly virtual support vectors xi(µi*). Multiple parameters µ1, µ2,..., µD. (Efficient) adaptation to kernel space. Semidefinite Perceptrons (NIPS poster with A. Kharechko and J. Shawe-Taylor). Microsoft Research Ltd.

Similar presentations

OK

5-1 Chapter 5 Theory & Problems of Probability & Statistics Murray R. Spiegel Sampling Theory.

5-1 Chapter 5 Theory & Problems of Probability & Statistics Murray R. Spiegel Sampling Theory.

© 2019 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google