Presentation is loading. Please wait.

Presentation is loading. Please wait.

Www.ntu.edu.sg Presented by: Mingkui Tan, Li Wang, Ivor W. Tsang School of Computer Engineering June 21-24, ICML2010 Haifa, Israel Learning Sparse SVM.

Similar presentations


Presentation on theme: "Www.ntu.edu.sg Presented by: Mingkui Tan, Li Wang, Ivor W. Tsang School of Computer Engineering June 21-24, ICML2010 Haifa, Israel Learning Sparse SVM."— Presentation transcript:

1 www.ntu.edu.sg Presented by: Mingkui Tan, Li Wang, Ivor W. Tsang School of Computer Engineering June 21-24, ICML2010 Haifa, Israel Learning Sparse SVM for Feature Selection on Very High Dimensional Datasets Background Experimental Results on Huge Dimensional Problems In many machine learning applications, there is a great desire of sparsity with respect to input features. In this work, we propose a new Feature Generating Machine (FGM) to learn the sparsity of features. We propose a new sparse model and transform it into a MKL problem with exponential linear kernels via a mild convex relaxation. We propose to solve the relaxed problem by using cutting plane algorithm which incorporates the MKL learning. We prove that the proposed algorithm can globally converge within limited iterations. We show that FGM scales linearly both on dimensions and instances. Empirically, FGM shows great scalability for non-monotonic feature selection on large-scale and very high dimensional problems. Introduce a 0-1 vector d to control the status of features (“1” means selected, “0” means not). As shown as follows: Suppose we want to select B features, we obtain a new sparse model: It can be convex relaxed and converted to a MKL problem with exponential linear kernels: where. In general, Model Methods We propose to use cutting plane algorithm to solve the MKL problem with exponential linear kernels[4]: (1)Toy Experiments: Toy features are shown in the first two figures. For non-monotonic feature selection, suppose B=2, one needs to select f 1 and f 2 as the most informative features and get the best prediction accuracy. We gradually increase noise features. The SVM(IDEAL) denotes the results obtained by only using f 1 and f 2. Fig 1. Results on synthetic dataset with varying noise features (2) Large-scale Real Data Experiments: The experiments on news20.binary(1355191×9996) Arxiv astro-ph(99757×62369), rcv1.binary(47236×20242), real-sim(20958×32309), URL0(3231961 ×16000) and URL1(3231961×20000) are reported. The number in brackets are (dimensions × instances). The comparison with SVM-RFE [2] below shows the competitiveness of our method in terms of both prediction accuracies and training time. news20.binary Arxiv astro-ph rcv1.binary real-sim URL0 URL1 news20.binary Arxiv astro-ph rcv1.binary real-sim URL0 URL1 [1] Chen, J. and Ye, J. Training SVM with indefinite kernels. In ICML, 2008. [2] Guyon, I., Weston, J., Barnhill, S., and Vapnik, V. Gene selection for cancer classification using support vector machines. Machine Learning, 46:389–422, 2002. [3] Hsieh, C.-J., Chang, K.-W., Lin, C.-J., Keerthi, S. S., and Sundararajan, S. A dual coordinate descent method for large-scale linear SVM. In ICML, 2008. [4] Li, Y.F., Tsang, I.W., Kwok, J.T., and Zhou, Z.H. Tighter and convex maximum margin clustering. In AISTATS, 2009b. … Convergence Property Theorem 1 : Assume that the sub-problem of MKL in step 2 and the most violated d selection in step 3 can be exactly solved, FGM can globally converge after a finite number of steps[1]. Theorem 2 : FGM scales linearly in computation with respect to n and m. In other words, FGM takes O(mn) time complexity [3]. References From Fig 1, FGM-B(FGM) shows the best performance in terms of prediction accuracy, sparsity and training time.


Download ppt "Www.ntu.edu.sg Presented by: Mingkui Tan, Li Wang, Ivor W. Tsang School of Computer Engineering June 21-24, ICML2010 Haifa, Israel Learning Sparse SVM."

Similar presentations


Ads by Google