Presentation is loading. Please wait.

Presentation is loading. Please wait.

Semi-Supervised Learning Using Label Mean

Similar presentations


Presentation on theme: "Semi-Supervised Learning Using Label Mean"— Presentation transcript:

1 Semi-Supervised Learning Using Label Mean
Yu-Feng Li1, James T. Kwok2, Zhi-Hua Zhou1 1LAMDA Group, Nanjing University, China {liyf, 2 Dept. Computer Science & Engineering, Hong Kong University of Science and Technology, Hong Kong

2 What’s the major obstacle to designing efficient S3VMs?
The Problem Many SVM algorithms for supervised learning are efficient. Existing S3VMs (Semi-Supervised SVMs) are not so efficient. What’s the major obstacle to designing efficient S3VMs? How to design an efficient S3VM?

3 Outline Introduction Our Methods Experiments Conclusion

4 Semi-Supervised Learning (SSL)
Introduction Semi-Supervised Learning (SSL) Optimal Hyperplane The goal of SSL is to improve the performance of supervised learning by utilizing unlabeled data

5 SSL Applications Text categorization [Joachims. ICML’99]
Introduction SSL Applications Text categorization [Joachims. ICML’99] Hand-written digit classification [Zhu et al., ICML’03; Zhu et al., ICML’05] Medical image segmentation [Grady & Funka-Lea, ECCV’04] Image retrieval [He at al., ACM Multimedia’04] Word sense disambiguation [Niu et al., ACL’04; Yarowsky et al., ACL’95; CUONG, Thesis07] Object detection [Rosenberg et al., WACV’05] … …

6 Introduction Many SSL Algorithms Generative methods [Miller & Uyar, NIPS’96; Nigam et al., MLJ00; Fujino et al., AAAI’05; etc.] Disagreement-based methods [Blum & Mitchell, COLT’98; Mitchell, ICCS’99; Nigam & Ghahi, CIKM’00; Zhou & Li, TKDE’05] Graph-based methods [Zhou et al., NIPS’02, Zhu et al., ICML03; Belkin et al., JMLR’06] … … Recent surveys of SSL literature: Chapelle et al., eds., Semi-Supervised Learning, MIT Press, 2006 Zhu, Semi-Supervised Learning Literature Survey, 2007 Zhou & Li, Semi-supervised learning by disagreement, KAIS, 2009

7 Introduction S3VMs Semi-supervised Support Vector Machine [Bennett & Demiriz, NIPS’99] Transductive SVM [Joachims, ICML’99] Laplacian SVM [Belkin et al., JMLR’06] SDP relaxations [De Bie & Cristianimi, NIPS’04; De Bie & Cristianim, JMLR’06] Many optimization algorithms for S3VM [Chepelle et al., JMLR’08] … …

8 S3VMs Optimal Hyperplane
Introduction S3VMs Optimal Hyperplane Low-Density Assumption & Cluster Assumption [Chellepe et al., ICML05]

9 S3VMs formulations Margin Loss on labeled data, e.g., hinge loss
Introduction S3VMs formulations Margin Loss on unlabeled data, e.g., symmetric hinge loss Loss on labeled data, e.g., hinge loss Balance constraint The effect of the objective in S3VM has been well-studied in [Chellepe et al., JMLR’08].

10 Efficiency of existing S3VMs
Introduction Efficiency of existing S3VMs [Bennett & Demiriz, NIPS’99] formulated S3VM as a mixed-integer programming problem, so it is computationally intractable in general Transductive SVM [Joachims, ICML’99] iteratively solves standard supervised SVM problems, however, the number of iterations may be quite large in practice Laplacian SVM [Belkin et al., JMLR’06] solves a small SVM with labeled data only, but it needs to calculate the inverse of an nn matrix ( O(n3) time and O(n2) memory) Existing S3VMs are inefficient

11 Analysis Our main observation:
Introduction Analysis Our main observation: Most S3VM algorithms aim at estimating the correct label of each unlabeled instance The number of constraints in the optimization problem will be as many as the unlabeled samples Can we use simpler statistics instead of the labels to reduce the number of constraints while still achieves competitively performance with state-of-art ssl methods? - label means.

12 Outline Introduction Our Methods Experiments Conclusion

13 Usefulness of the Label Mean
Our Methods Usefulness of the Label Mean We consider the following optimization problem are estimations of the label means

14 Usefulness of the Label Mean (cont.)
Our Methods Usefulness of the Label Mean (cont.) MeanS3VM This motivates us to first estimate the label means of the unlabeled instances. Difference only exists when samples are non-separable This analysis suggests that, if an S3VM “knows” the label means of the unlabeled instances, it can closely approximate an SVM that “knows” all the labels of the unlabeled instances!

15 Estimate the label mean
Our Methods Estimate the label mean Maximal margin approach We propose two algorithms to solve it, one is based on convex relaxation, the other is based on alternating optimization. Note that it has much fewer constraints than S3VM, which greatly reduces the time complexity of the optimization. It can also be explained in terms of MMD [Gretton et al., NIPS’06] which aims to separate distribution of different classes with large margin.

16 Convex relaxation approach
Our Methods Convex relaxation approach Consider the dual Consider the minimax relaxation [Li et al., AISTATS’09] Multiple Kernel Learning

17 Convex relaxation approach (cont.)
Our Methods Convex relaxation approach (cont.) Exponential number of base kernels…. Too expensive Cutting plane algorithm Adaptive SimpleMKL How?

18 It is a concave QP, and could not be solved efficiently…
Our Methods Find the most violated d To find the most violated d, we need to solve the following maximization problem Rewritten as It is a concave QP, and could not be solved efficiently… Not related to d However, the cutting plane method only requires to add a violated constraint at each iteration Hence, we propose a simple and efficient method for finding a good approximation of the most violated d Linear problem, can be solved by sorting

19 Can still be solved by sorting
Our Methods Alternating Optimization Iterate until convergence. Fixed d, solve the dual variable Standard SVM Fixed dual variable, solve the d Can still be solved by sorting

20 Comparison and means3vm implementation
Our Methods Comparison and means3vm implementation Convex relaxation approach is global optimization Alternating optimization approach may get stuck in local solution, but simple and empirically faster We use the result of d from these two approaches, together with the labels of the labeled data, to train a final SVM We denote convex relaxation approach as meanS3vm-mkl and alternating optimization approach as meanS3vm-iter

21 Outline Introduction Our Methods Experiments Conclusion

22 Four Kinds of Tasks Benchmark tasks UCI data sets Text categorization
Experiments Four Kinds of Tasks Benchmark tasks UCI data sets Text categorization Speed

23 meanS3vms achieve highly competitive performance.
Experiments Benchmark Tasks Following the same setup as S3VM meanS3vms achieve highly competitive performance.

24 Experiments UCI datasets 9 data sets, 10 labeled data, 50% train / 50% test, 20 runs win Means3vms achieve highly competitive performance in all data sets. In particular, they achieve the best performance in 6 of 9 tasks.

25 Text Categorization win 0 2 0 0 4 4
Experiments Text Categorization 10 binary tasks: 2 labeled data, 50% train / 50% test, 20 runs win Means3vms achieve highly competitive performance in all data sets. They achieve the best performance in 8 of 10

26 Experiments Speed On large data sets (with more than 1,000 instances), means3vm-mkl is much faster than Laplacian SVM. means3vm-iter is almost the fastest method. On large data sets, means3vm-iter is 10 times faster than Laplacian SVM, 100 times faster than TSVM.

27 Thanks! Conclusion Main contribution: Future work:
S3VM + label means ~ SVM with full labels Two efficient and effective SSL methods Future work: Theoretical study on the effect of label means Other approaches to estimating label means Thanks!


Download ppt "Semi-Supervised Learning Using Label Mean"

Similar presentations


Ads by Google