Presentation is loading. Please wait.

Presentation is loading. Please wait.

Online Learning of Maximum Margin Classifiers Kohei HATANO Kyusyu University (Joint work with K. Ishibashi and M. Takeda) p-Norm with Bias COLT 2008.

Similar presentations


Presentation on theme: "Online Learning of Maximum Margin Classifiers Kohei HATANO Kyusyu University (Joint work with K. Ishibashi and M. Takeda) p-Norm with Bias COLT 2008."— Presentation transcript:

1 Online Learning of Maximum Margin Classifiers Kohei HATANO Kyusyu University (Joint work with K. Ishibashi and M. Takeda) p-Norm with Bias COLT 2008

2 Plan of this talk 1.Introduction 2.Preliminalies ―ROMMA 3.Our result – Our new algorithm – Our implicit reduction 4.Experiments PUMMA

3 Maximum Margin Classification margin SVMs [Boser et al. 92] – 2-norm margin Boosting [Freund&Schapire 97] – ∞-norm margin (approximtely) Why maximum (or large) margin? – Good generalization [Schapire et al. 98] [Shawe-Taylor et al. 98] – Formulated as convex optimization problems(QP, LP)

4 Scaling up Max. Margin Classification 1.Decomposition Methods (for SVMs) – Break original QP into smaller QPs – SMO [Platt 99],SVM light [Joachims 99], LIBSVM [Chang & Lin 01] – state-of-the-art implementations 2. Online Learning (our approach)

5 Online Learning Advantages of online Learning Simple & easy to implement Uses less memory Adaptive for changing concepts Online Learning Algorithm For t=1 to T 1.Receive an instance x t in R n 2.Guess a label ŷ t =sign(w t ∙ x t +b t ) 3.Receive the label y t in {-1,1} 4.Update (w t+1,b t+1 )=UPDATE_RULE(w t,b t,x t,y t ) end xtxt +1? (w t,b t ) (w t+1,b t+1 )

6 Online Learning Algorithms for maximum margin classification Max Margin Perceptron [Kowalzyk 00] ROMMA [Li & Long 02] ALMA [Gentile 01] LASVM [Bordes et al. 05] MICRA [Tsampouka&Shawe-Taylor 07] Pegasos [Shalev-Shwalz et al. 07] Etc. Most of online algs cannot learn hyperplane with bias! bias 0 hyperplane with bias hyperplane w/o bias

7 Typical Reduction to deal with bias [Cf. Cristianini& Shawe-Taylor 00] Adding an extra dimension corresponding bias. Original space Augmented space instance hyperplane ↔ ↔ margin (over normalized Instances) ↔ NOTE: is equivalent with (u,b) ) This reduction weaken the guarantee of margin: →it might cause significant difference in genealization!

8 Our New Online Learning Algorithm PUMMA (P-norm Utilizing Maximum Margin Algorithm) PUMMA can learn maximum margin classifiers with bias directly (without using the typical reduction!). Margin is defined as p-norm (p≥2) – For p=2, similar to Perceptron. – For p=O(ln n) [Gentile ’03 ], similar to Winnow [Littlestone ‘89]. Fast when the target is sparse. Extended to linearly inseparable case (omitted). – Soft margin with2-norm slack variables.

9 Problem of finding the p-norm maximum margin hyperplane [Cf. Mangasarian 99] 0 Given: (linearly separable) S=((x 1,y 1 ),…,(x T,y T )), Goal: Find an approximate solution of (w*,b*) We want an online alg. solving the problem with small # of updates. q-norm (dual norm) 1/p+1/q=1 E.g. p=2, q=2 p=∞, q=1

10 ROMMA (Relaxed Online Maximum Margin Algorithm)[Li&Long,’02] Given: S=((x 1,y 1 ),…,(x t-1,y t-1 )), x t, 1.Predict ŷ t =sign(w t ∙x t ), and receive y t 2.If y t (w t ·x t )<1-δ (margin is “insufficient”), 3.update: 4.Otherwise, w t+1 =w t Constraint over the last example which causes an update Constraint over the last hyperplane 2 constraints only! NOTE: bias is fixed with 0

11 ROMMA [Li&Long,’02] 0 weght space 1 2 4 3 w1w1 w2w2 w SVM w3w3 feasible region of SVM

12 Solution of ROMMA Solution of ROMMA is an additive update:

13 PUMMA bias is optimized q-norm (1/p+1/q=1) x pos t, x neg t : last positive and negative examples which incur updates link function [Grove et al. 97] Given: S=((x 1,y 1 ),…,(x t-1,y t-1 )), x t, 1.Predict ŷ t =sign(w t ∙x t ), and receive y t 2.If y t (w t ·x t +b t )>1-δ, update: 3.Otherwise, w t+1 =w t ○ ≧1≧1 ROMMA ○ ≧1≧1 PUMMA ● ≧1≧1

14 Solution of PUMMA Observation: For p=2, the solution is the same as that of ROMMA for z t = x t pos – x t neg. Solution of PUMMA is found numerically: x pos t, x neg t : last positive and negative examples which incur updates

15 Our (implicit) reduction which preserves the margin For p=2, margin =- PUMMA implicitly runs ROMMA over pairs of positive and negative instances in an efficient way! hyperplane with bias hyperplane without bias over pairs of positive and negative instances

16 Main Result Thm Suppose that given S=((x 1,y 1 ),…,(x T,y T )), there exists a linear classifier (u,b), s.t. y t (u·x+b)≥1 for t=1,…,T. (# of updates of PUMMA p (δ)) ≤(p-1)  u  q 2 R 2 / δ 2 After (p-1)  u  q 2 R 2 / δ 2 updates, PUMMAp(δ) outputs a hypothesis with p-norm margin ≥ (1-δ)γ (γ: margin of (u,b) ). similar to those of previous algorithms

17 example (x,y) - x: n(=100)-dimensional {-1,+1}-valued vector - y=f(x),where generate 1000 examples randomly 3 datasets (b=1 (small), 9(medium), 15(large)) Compare ROMMA(p=2), ALMA(p=2ln n). Experiment over artificial data

18 Results over Artificial Data NOTE1: margin is defined over the original space (w/o reduction) NOTE2: We omit the results for b=9 for clarity. PUMMA ROMMA PUMMA ALMA # of updates margin # of updates margin p=2 p=2ln n

19 Computation Time time For p=2 , PUMMA is faster than ROMMA. For p=2ln n , PUMMA is faster than ALMA even though PUMMA uses Newton method. p=2p=2ln n large← bias →small PUMMA ROMMA PUMMA ALMA Sec.

20 Results over UCI Adult data result adult # of data 32561 algorithm sec. magin rate SVM light 5893100 ROMMA (99%) 7129699.03 PUMMA (99%) 4448099.14 Fix p=2. 2-norm soft margin formulation for linearly inseparable data. Run ROMMA and PUMMA until they achieves 99% of the maximum margin.

21 Results over MNIST data MNIST # of data algorithm sec. margin rate(%) SVM light 401.36100 ROMMA (99%) 1715.5793.5 PUMMA (99%) 1971.3099.2 Fix p=2. Use polynomial kernels. 2-norm soft margin formulation for linearly inseparable data. Run ROMMA and PUMMA until they achieves 99% of the maximum margin.

22 Summary PUMMA can learn p-norm maximum margin classifiers with bias directly. – # of updates is similar to those of previous algs. – achieves (1-δ) times the maximum p-norm margin. PUMMA outperforms other online algs when the underlying hyperplane has large bias.

23 Future work Maximizing ∞-norm margin directly. Tighter bounds of # of updates: – In our experiments, PUMMA is faster especially when bias is large (like WINNOW). – Our current bound does not reflect this fact.


Download ppt "Online Learning of Maximum Margin Classifiers Kohei HATANO Kyusyu University (Joint work with K. Ishibashi and M. Takeda) p-Norm with Bias COLT 2008."

Similar presentations


Ads by Google