Presentation on theme: "Taiji Suzuki Ryota Tomioka The University of Tokyo"— Presentation transcript:
1 SpicyMKL Efficient multiple kernel learning method using dual augmented Lagrangian Taiji SuzukiRyota TomiokaThe University of TokyoGraduate School of Information Science and TechnologyDepartment of Mathematical Informatics23th Oct.,
2 Multiple Kernel Learning Fast Algorithm SpicyMKL
3 Kernel LearningRegression, ClassificationKernel Function
4 Thousands of kernels Multiple Kernel Learning Gaussian, polynomial, chi-square, ….Parameters：Gaussian width, polynomial degreeFeaturesComputer Vision：color, gradient, sift (sift, hsvsift, huesift, scaling of sift), Geometric Blur, image regions, ．．．Thousands of kernelsMultiple Kernel Learning(Lanckriet et al. 2004)Select important kernels and combine them
19 DAL (Tomioka & Sugiyama 09) Dual Augmented LagrangianLasso, group Lasso, trace norm regularizationSpicyMKL = DAL + MKLKernelization of DAL
20 Proximal Minimization Difficulty of MKL（Lasso）is non-diff’ble at 0.Relax byProximal Minimization(Rockafellar, 1976)
21 Our algorithm super linearly ! SpicyMKL: Proximal Minimization (Rockafellar, 1976)1Set some initial solutionChoose increasing sequence2Iterate the following update rule until convergence:Regularization toward the last update.It can be shown thatsuper linearly !Taking its dual, the update step can be solved efficiently
28 What we have observed Differentiable We arrived at …Dual of update rule in proximal minimizationDifferentiableOnly active kernels need to be calculated.(next slide)
29 Skipping inactive kernels Soft thresholdHard thresholdIf (inactive), ,and its gradient also vanishes.
30 Derivatives We can apply Newton method for the dual optimization. The objective function:Its derivatives:whereOnly active kernels contributes their computations.→ Efficient for large number of kernels.
31 Convergence result For any proper convex functions and , Thm.For any proper convex functions and ,and any non-decreasing sequence ,Thm.If is strongly convex and is sparse reg.,for any increasing sequence ,the convergence rate is super linear.
32 Non-differentiable loss If loss is non-differentiable, almost the same algorithm is available by utilizing Moreau envelope of .primaldualMoreau envelope of dual
33 Dual of elastic net Naïve optimization in the dual is applicable. primaldualSmooth !Naïve optimization in the dual is applicable.But we applied prox. minimization methodfor small to avoid numerical instability.
34 Why ‘Spicy’ ? SpicyMKL = DAL + MKL. DAL also means a major Indian cuisine. → hot, spicy → SpicyMKL“Dal”from Wikipedia
35 Outline Introduction Details of Multiple Kernel Learning Our method Proximal minimizationSkipping inactive kernelsNumerical ExperimentsBench-mark datasetsSparse or dense
41 Sparse or Dense ? Elastic Net Sparse Dense where1SparseDenseCompare performances of sparse and dense solutionsin a computer vision task.
42 caltech101 dataset (Fei-Fei et al., 2004) object recognition taskanchor, ant, cannon, chair, cup10 classification problems (10=5×(5-1)/2)# of kernels: 1760 = 4 (features)×22 (regions) × あああああ2 (kernel functions) × 10 (kernel parameters)sift features [Lowe99]: colorDescriptor [van de Sande et al.10] hsvsift, sift (scale = 4px, 8px, automatic scale)regions: whole region, 4 division, 16 division, spatial pyramid. (computed histogram of features on each region.)kernel functions: Gaussian kernels, kernels with10 parameters.
43 Performances are averaged over all classification problems. BestPerformance of sparse solution iscomparable with dense solution.denseMedium density
44 Conclusion and Future works まとめと今後の課題ConclusionWe proposed a new MKL algorithm that is efficient when # of kernels is large.proximal minimizationneglect ‘in-active’ kernelsMedium density showed the best performance, but sparse solution also works well.Future workSecond order update
45 T. Suzuki & R. Tomioka: SpicyMKL. Technical reportT. Suzuki & R. Tomioka: SpicyMKL.arXiv:DALR. Tomioka & M. Sugiyama: Dual Augmented Lagrangian Method for Efficient Sparse Reconstruction. IEEE Signal Proccesing Letters, 16 (12) pp , 2009.