Semi-Supervised Learning Using Label Mean

Slides:



Advertisements
Similar presentations
A Support Vector Method for Optimizing Average Precision
Advertisements

Latent Variables Naman Agarwal Michael Nute May 1, 2013.
Spectral graph reduction for image and streaming video segmentation Fabio Galasso 1 Margret Keuper 2 Thomas Brox 2 Bernt Schiele 1 1 Max Planck Institute.
Tighter and Convex Maximum Margin Clustering Yu-Feng Li (LAMDA, Nanjing University, China) Ivor W. Tsang.
Structured SVM Chen-Tse Tsai and Siddharth Gupta.
SVM—Support Vector Machines
Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learning.
Proactive Learning: Cost- Sensitive Active Learning with Multiple Imperfect Oracles Pinar Donmez and Jaime Carbonell Pinar Donmez and Jaime Carbonell Language.
Large Scale Manifold Transduction Michael Karlen Jason Weston Ayse Erkan Ronan Collobert ICML 2008.
Discriminative, Unsupervised, Convex Learning Dale Schuurmans Department of Computing Science University of Alberta MITACS Workshop, August 26, 2005.
Continuous optimization Problems and successes
Ziming Zhang*, Ze-Nian Li, Mark Drew School of Computing Science Simon Fraser University Vancouver, Canada {zza27, li, AdaMKL: A Novel.
Learning on Probabilistic Labels Peng Peng, Raymond Chi-wing Wong, Philip S. Yu CSE, HKUST 1.
Support Vector Machines
Graph Based Semi- Supervised Learning Fei Wang Department of Statistical Science Cornell University.
1 Transfer Learning Algorithms for Image Classification Ariadna Quattoni MIT, CSAIL Advisors: Michael Collins Trevor Darrell.
Inductive Semi-supervised Learning Gholamreza Haffari Supervised by: Dr. Anoop Sarkar Simon Fraser University, School of Computing Science.
Semi Supervised Learning Qiang Yang –Adapted from… Thanks –Zhi-Hua Zhou – ople/zhouzh/ –LAMDA.
Efficient Convex Relaxation for Transductive Support Vector Machine Zenglin Xu 1, Rong Jin 2, Jianke Zhu 1, Irwin King 1, and Michael R. Lyu 1 4. Experimental.
Active Learning with Support Vector Machines
Co-training LING 572 Fei Xia 02/21/06. Overview Proposed by Blum and Mitchell (1998) Important work: –(Nigam and Ghani, 2000) –(Goldman and Zhou, 2000)
Dept. of Computer Science & Engineering, CUHK Pseudo Relevance Feedback with Biased Support Vector Machine in Multimedia Retrieval Steven C.H. Hoi 14-Oct,
Graph-Based Semi-Supervised Learning with a Generative Model Speaker: Jingrui He Advisor: Jaime Carbonell Machine Learning Department
Bing LiuCS Department, UIC1 Learning from Positive and Unlabeled Examples Bing Liu Department of Computer Science University of Illinois at Chicago Joint.
1 Multiple Kernel Learning Naouel Baili MRL Seminar, Fall 2009.
Large-Scale Text Categorization By Batch Mode Active Learning Steven C.H. Hoi †, Rong Jin ‡, Michael R. Lyu † † CSE Department, Chinese University of Hong.
Trading Convexity for Scalability Marco A. Alvarez CS7680 Department of Computer Science Utah State University.
Introduction to domain adaptation
August 16, 2015EECS, OSU1 Learning with Ambiguously Labeled Training Data Kshitij Judah Ph.D. student Advisor: Prof. Alan Fern Qualifier Oral Presentation.
A k-Nearest Neighbor Based Algorithm for Multi-Label Classification Min-Ling Zhang
Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He.
Smart RSS Aggregator A text classification problem Alban Scholer & Markus Kirsten 2005.
Dual Coordinate Descent Algorithms for Efficient Large Margin Structured Prediction Ming-Wei Chang and Scott Wen-tau Yih Microsoft Research 1.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.
ICML2004, Banff, Alberta, Canada Learning Larger Margin Machine Locally and Globally Kaizhu Huang Haiqin Yang, Irwin King, Michael.
Stochastic Subgradient Approach for Solving Linear Support Vector Machines Jan Rupnik Jozef Stefan Institute.
Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago.
Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science & Information Engineering.
Extending the Multi- Instance Problem to Model Instance Collaboration Anjali Koppal Advanced Machine Learning December 11, 2007.
Considering Cost Asymmetry in Learning Classifiers Presented by Chunping Wang Machine Learning Group, Duke University May 21, 2007 by Bach, Heckerman and.
Multiple Instance Learning via Successive Linear Programming Olvi Mangasarian Edward Wild University of Wisconsin-Madison.
SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.
RSVM: Reduced Support Vector Machines Y.-J. Lee & O. L. Mangasarian First SIAM International Conference on Data Mining Chicago, April 6, 2001 University.
Bing LiuCS Department, UIC1 Chapter 8: Semi-supervised learning.
Ohad Hageby IDC Support Vector Machines & Kernel Machines IP Seminar 2008 IDC Herzliya.
Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin.
Online Learning of Maximum Margin Classifiers Kohei HATANO Kyusyu University (Joint work with K. Ishibashi and M. Takeda) p-Norm with Bias COLT 2008.
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
Presented by: Mingkui Tan, Li Wang, Ivor W. Tsang School of Computer Engineering June 21-24, ICML2010 Haifa, Israel Learning Sparse SVM.
Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan University of Science and Technology and Institute.
SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.
The Chinese University of Hong Kong Learning Larger Margin Machine Locally and Globally Dept. of Computer Science and Engineering The Chinese University.
Graph-based WSD の続き DMLA /7/10 小町守.
Learning to Align: a Statistical Approach
Semi-Supervised Clustering
Yu-Feng Li 1, James T. Kwok2, Ivor W. Tsang3 and Zhi-Hua Zhou1
Glenn Fung, Murat Dundar, Bharat Rao and Jinbo Bi
Support Vector Machines
Jan Rupnik Jozef Stefan Institute
A Consensus-Based Clustering Method
Kernels Usman Roshan.
COSC 4335: Other Classification Techniques
Usman Roshan CS 675 Machine Learning
Concave Minimization for Support Vector Machine Classifiers
COSC 4368 Machine Learning Organization
Learning Incoherent Sparse and Low-Rank Patterns from Multiple Tasks
University of Wisconsin - Madison
University of Wisconsin - Madison
Minimal Kernel Classifiers
Presentation transcript:

Semi-Supervised Learning Using Label Mean Yu-Feng Li1, James T. Kwok2, Zhi-Hua Zhou1 1LAMDA Group, Nanjing University, China {liyf, zhouzh}@lamda.nju.edu.cn 2 Dept. Computer Science & Engineering, Hong Kong University of Science and Technology, Hong Kong jamesk@cse.ust.hk

What’s the major obstacle to designing efficient S3VMs? The Problem Many SVM algorithms for supervised learning are efficient. Existing S3VMs (Semi-Supervised SVMs) are not so efficient. What’s the major obstacle to designing efficient S3VMs? How to design an efficient S3VM?

Outline Introduction Our Methods Experiments Conclusion

Semi-Supervised Learning (SSL) Introduction Semi-Supervised Learning (SSL) Optimal Hyperplane The goal of SSL is to improve the performance of supervised learning by utilizing unlabeled data

SSL Applications Text categorization [Joachims. ICML’99] Introduction SSL Applications Text categorization [Joachims. ICML’99] Hand-written digit classification [Zhu et al., ICML’03; Zhu et al., ICML’05] Medical image segmentation [Grady & Funka-Lea, ECCV’04] Image retrieval [He at al., ACM Multimedia’04] Word sense disambiguation [Niu et al., ACL’04; Yarowsky et al., ACL’95; CUONG, Thesis07] Object detection [Rosenberg et al., WACV’05] … …

Introduction Many SSL Algorithms Generative methods [Miller & Uyar, NIPS’96; Nigam et al., MLJ00; Fujino et al., AAAI’05; etc.] Disagreement-based methods [Blum & Mitchell, COLT’98; Mitchell, ICCS’99; Nigam & Ghahi, CIKM’00; Zhou & Li, TKDE’05] Graph-based methods [Zhou et al., NIPS’02, Zhu et al., ICML03; Belkin et al., JMLR’06] … … Recent surveys of SSL literature: Chapelle et al., eds., Semi-Supervised Learning, MIT Press, 2006 Zhu, Semi-Supervised Learning Literature Survey, 2007 Zhou & Li, Semi-supervised learning by disagreement, KAIS, 2009

Introduction S3VMs Semi-supervised Support Vector Machine [Bennett & Demiriz, NIPS’99] Transductive SVM [Joachims, ICML’99] Laplacian SVM [Belkin et al., JMLR’06] SDP relaxations [De Bie & Cristianimi, NIPS’04; De Bie & Cristianim, JMLR’06] Many optimization algorithms for S3VM [Chepelle et al., JMLR’08] … …

S3VMs Optimal Hyperplane Introduction S3VMs Optimal Hyperplane Low-Density Assumption & Cluster Assumption [Chellepe et al., ICML05]

S3VMs formulations Margin Loss on labeled data, e.g., hinge loss Introduction S3VMs formulations Margin Loss on unlabeled data, e.g., symmetric hinge loss Loss on labeled data, e.g., hinge loss Balance constraint The effect of the objective in S3VM has been well-studied in [Chellepe et al., JMLR’08].

Efficiency of existing S3VMs Introduction Efficiency of existing S3VMs [Bennett & Demiriz, NIPS’99] formulated S3VM as a mixed-integer programming problem, so it is computationally intractable in general Transductive SVM [Joachims, ICML’99] iteratively solves standard supervised SVM problems, however, the number of iterations may be quite large in practice Laplacian SVM [Belkin et al., JMLR’06] solves a small SVM with labeled data only, but it needs to calculate the inverse of an nn matrix ( O(n3) time and O(n2) memory) Existing S3VMs are inefficient

Analysis Our main observation: Introduction Analysis Our main observation: Most S3VM algorithms aim at estimating the correct label of each unlabeled instance The number of constraints in the optimization problem will be as many as the unlabeled samples Can we use simpler statistics instead of the labels to reduce the number of constraints while still achieves competitively performance with state-of-art ssl methods? - label means.

Outline Introduction Our Methods Experiments Conclusion

Usefulness of the Label Mean Our Methods Usefulness of the Label Mean We consider the following optimization problem are estimations of the label means

Usefulness of the Label Mean (cont.) Our Methods Usefulness of the Label Mean (cont.) MeanS3VM This motivates us to first estimate the label means of the unlabeled instances. Difference only exists when samples are non-separable This analysis suggests that, if an S3VM “knows” the label means of the unlabeled instances, it can closely approximate an SVM that “knows” all the labels of the unlabeled instances!

Estimate the label mean Our Methods Estimate the label mean Maximal margin approach We propose two algorithms to solve it, one is based on convex relaxation, the other is based on alternating optimization. Note that it has much fewer constraints than S3VM, which greatly reduces the time complexity of the optimization. It can also be explained in terms of MMD [Gretton et al., NIPS’06] which aims to separate distribution of different classes with large margin.

Convex relaxation approach Our Methods Convex relaxation approach Consider the dual Consider the minimax relaxation [Li et al., AISTATS’09] Multiple Kernel Learning

Convex relaxation approach (cont.) Our Methods Convex relaxation approach (cont.) Exponential number of base kernels…. Too expensive Cutting plane algorithm Adaptive SimpleMKL How?

It is a concave QP, and could not be solved efficiently… Our Methods Find the most violated d To find the most violated d, we need to solve the following maximization problem Rewritten as It is a concave QP, and could not be solved efficiently… Not related to d However, the cutting plane method only requires to add a violated constraint at each iteration Hence, we propose a simple and efficient method for finding a good approximation of the most violated d Linear problem, can be solved by sorting

Can still be solved by sorting Our Methods Alternating Optimization Iterate until convergence. Fixed d, solve the dual variable Standard SVM Fixed dual variable, solve the d Can still be solved by sorting

Comparison and means3vm implementation Our Methods Comparison and means3vm implementation Convex relaxation approach is global optimization Alternating optimization approach may get stuck in local solution, but simple and empirically faster We use the result of d from these two approaches, together with the labels of the labeled data, to train a final SVM We denote convex relaxation approach as meanS3vm-mkl and alternating optimization approach as meanS3vm-iter

Outline Introduction Our Methods Experiments Conclusion

Four Kinds of Tasks Benchmark tasks UCI data sets Text categorization Experiments Four Kinds of Tasks Benchmark tasks UCI data sets Text categorization Speed

meanS3vms achieve highly competitive performance. Experiments Benchmark Tasks Following the same setup as S3VM meanS3vms achieve highly competitive performance.

Experiments UCI datasets 9 data sets, 10 labeled data, 50% train / 50% test, 20 runs win 0 1 0 0 3 2 4 Means3vms achieve highly competitive performance in all data sets. In particular, they achieve the best performance in 6 of 9 tasks.

Text Categorization win 0 2 0 0 4 4 Experiments Text Categorization 10 binary tasks: 2 labeled data, 50% train / 50% test, 20 runs win 0 2 0 0 4 4 Means3vms achieve highly competitive performance in all data sets. They achieve the best performance in 8 of 10

Experiments Speed On large data sets (with more than 1,000 instances), means3vm-mkl is much faster than Laplacian SVM. means3vm-iter is almost the fastest method. On large data sets, means3vm-iter is 10 times faster than Laplacian SVM, 100 times faster than TSVM.

Thanks! Conclusion Main contribution: Future work: S3VM + label means ~ SVM with full labels Two efficient and effective SSL methods Future work: Theoretical study on the effect of label means Other approaches to estimating label means Thanks!