Presentation is loading. Please wait.

Presentation is loading. Please wait.

Large Scale Manifold Transduction Michael Karlen Jason Weston Ayse Erkan Ronan Collobert ICML 2008.

Similar presentations


Presentation on theme: "Large Scale Manifold Transduction Michael Karlen Jason Weston Ayse Erkan Ronan Collobert ICML 2008."— Presentation transcript:

1 Large Scale Manifold Transduction Michael Karlen Jason Weston Ayse Erkan Ronan Collobert ICML 2008

2 Index Introduction Problem Statement Existing Approaches –Transduction :- TSVM –Manifold – Regularization :- LapSVM Proposed Work Experiments

3 Introduction Objective :- Discriminative classification using unlabeled data. Popular methods –Maximizing margin on unlabeled data as in TSVM so that decision rule lies in low density. –Learning cluster or manifold structure from unlabeled data as in cluster kernels, label propagation and Laplacian SVMs.

4 Problem Statement Inability of the existing techniques to scale to very large datasets, also online data.

5 Existing Techniques TSVM –Problem Formulation :- - Non-Convex Problem

6 Problems with TSVM :- – When dimension >> L ( no of Labeled examples), all unlabeled points may be classified to one class while still classifying the labeled data correctly, giving lower objective value. Solution :- –Introduce a balancing constraint in the objective function.

7 Implementations to Solve TSVM S 3 VM :- –Mixed Integer Programming. Intractable for large data sets. SVMLight-TSVM :- –Initially fixes labels of unlabeled examples and then iteratively switches those labels to improve TSVM objective function, solving convex objective function at each step. –Introduces balancing constraint. –Handles few thousand examples.

8 VS 3 VM:- – A concave-convex minimization approach was proposed to solve successive convex problems. –Only Linear Case with no balancing constraints. Delta-TSVM :- –Optimize TSVM by gradient descent in primal. –Needs entire Kernel matrix (for non-linear case) to be in memory, hence inefficient for large datasets. –Introduce a balancing constraint.

9 CCCP-TSVM:- –Concave-Convex procedure. –Non-linear extension of VS 3 VMs. –Same balancing constraint as delta-TSVM. –100-time faster than SVM-light and 50-times faster than delta-TSVM. –40 hours to solve 60,000 unlabeled example in non- linear case. Still not scalable enough. Large Scale Linear TSVMs :- –Same label switching technique as in SVM-Light, but considered multiple labels at once. –Solved in the primal formulation. –Not good for non-linear case.

10 Manifold-Regularization Two Stage Problem :- –Learn an embedding E.g. Laplacian Eigen-maps, Isomap or spectral clustering. –Train a Classifier in this new space. Laplacian SVM :- Laplacian Eigen Map

11 Using Both Approaches LDS (Low Density Separation) –First, Isomap-like embedding method of “graph”-SVM is used, whereby data is clustered. –In the new embedding space, Delta-TSVM is applied. Problems –The two-stage approach seems ad-hoc –Method is slow.

12 Proposed Approach Objective Function Non-Convex

13 Details The primal problem is solved by gradient descent. So, online semi-supervised learning is possible. For non-linear case, a multi-layer architecture is implemented. This makes training and testing faster than computing the kernel. (Hard Tanh – function is used) Also, recommendation for online balancing constraint is given.

14 Balancing Constraint A cache of last 25c predictions f(x i *), where c is the number of class, is preserved. Next balanced prediction is made by assuming a fixed estimate p est (y) of the probability of each class, which can be estimated from the distribution of labeled data.

15 One of the two decisions are made :- –Delta-bal :- Add the delta-TSVM balancing function multiplied by a scaling factor to the objective. Disadvantage of identifying optimal scaling factor. –Igonore-bal :- Based on the distribution of examples-label pairs in the cache, If the next unlabeled example has too many examples assigned to it, do not make a gradient step.

16 Further a smooth version of p trn can be achieved by labeling the unlabeled data by k nearest neighbors of each labeled data. We derive p knn, that can be used for implementing the balancing constraint.

17 Online Manifold Transduction

18 Experiments Data Sets Used

19 Test Error for Various Methods

20

21 Large Scale Datasets


Download ppt "Large Scale Manifold Transduction Michael Karlen Jason Weston Ayse Erkan Ronan Collobert ICML 2008."

Similar presentations


Ads by Google