Large Scale Manifold Transduction Michael Karlen Jason Weston Ayse Erkan Ronan Collobert ICML 2008.

Slides:



Advertisements
Similar presentations
ECG Signal processing (2)
Advertisements

Unsupervised Learning Clustering K-Means. Recall: Key Components of Intelligent Agents Representation Language: Graph, Bayes Nets, Linear functions Inference.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Classification / Regression Support Vector Machines
Tighter and Convex Maximum Margin Clustering Yu-Feng Li (LAMDA, Nanjing University, China) Ivor W. Tsang.
CHAPTER 10: Linear Discrimination
Pattern Recognition and Machine Learning
An Introduction of Support Vector Machine
SVM—Support Vector Machines
Support vector machine
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
Machine learning continued Image source:
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learning.
Discriminative, Unsupervised, Convex Learning Dale Schuurmans Department of Computing Science University of Alberta MITACS Workshop, August 26, 2005.
Classification and Decision Boundaries
Support Vector Machines (and Kernel Methods in general)
Support Vector Machines and Kernel Methods
Fuzzy Support Vector Machines (FSVMs) Weijia Wang, Huanren Zhang, Vijendra Purohit, Aditi Gupta.
Semi-Supervised Classification by Low Density Separation Olivier Chapelle, Alexander Zien Student: Ran Chang.
Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane.
Efficient Convex Relaxation for Transductive Support Vector Machine Zenglin Xu 1, Rong Jin 2, Jianke Zhu 1, Irwin King 1, and Michael R. Lyu 1 4. Experimental.
Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign.
Support Vector Machines
Lecture 10: Support Vector Machines
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Trading Convexity for Scalability Marco A. Alvarez CS7680 Department of Computer Science Utah State University.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Support Vector Machines
Collaborative Filtering Matrix Factorization Approach
Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign.
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
Manifold learning: Locally Linear Embedding Jieping Ye Department of Computer Science and Engineering Arizona State University
Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?
IEEE TRANSSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
An Introduction to Support Vector Machine (SVM) Presenter : Ahey Date : 2007/07/20 The slides are based on lecture notes of Prof. 林智仁 and Daniel Yeung.
Stochastic Subgradient Approach for Solving Linear Support Vector Machines Jan Rupnik Jozef Stefan Institute.
Transductive Regression Piloted by Inter-Manifold Relations.
SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.
Ohad Hageby IDC Support Vector Machines & Kernel Machines IP Seminar 2008 IDC Herzliya.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
An Introduction to Support Vector Machine (SVM)
Linear Models for Classification
CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Dd Generalized Optimal Kernel-based Ensemble Learning for HS Classification Problems Generalized Optimal Kernel-based Ensemble Learning for HS Classification.
Dec 21, 2006For ICDM Panel on 10 Best Algorithms Support Vector Machines: A Survey Qiang Yang, for ICDM 2006 Panel Partially.
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Locally Linear Support Vector Machines Ľubor Ladický Philip H.S. Torr.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
1 Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 23, 2010 Piotr Mirowski Based on slides by Sumit.
Non-separable SVM's, and non-linear classification using kernels Jakob Verbeek December 16, 2011 Course website:
Semi-Supervised Learning Using Label Mean
Support vector machines
PREDICT 422: Practical Machine Learning
Large Margin classifiers
Machine Learning Basics
Support Vector Machines Introduction to Data Mining, 2nd Edition by
Outline Nonlinear Dimension Reduction Brief introduction Isomap LLE
CS 2750: Machine Learning Support Vector Machines
Collaborative Filtering Matrix Factorization Approach
CSCI B609: “Foundations of Data Science”
COSC 4335: Other Classification Techniques
Support Vector Machines
Support vector machines
Linear Discrimination
Presentation transcript:

Large Scale Manifold Transduction Michael Karlen Jason Weston Ayse Erkan Ronan Collobert ICML 2008

Index Introduction Problem Statement Existing Approaches –Transduction :- TSVM –Manifold – Regularization :- LapSVM Proposed Work Experiments

Introduction Objective :- Discriminative classification using unlabeled data. Popular methods –Maximizing margin on unlabeled data as in TSVM so that decision rule lies in low density. –Learning cluster or manifold structure from unlabeled data as in cluster kernels, label propagation and Laplacian SVMs.

Problem Statement Inability of the existing techniques to scale to very large datasets, also online data.

Existing Techniques TSVM –Problem Formulation :- - Non-Convex Problem

Problems with TSVM :- – When dimension >> L ( no of Labeled examples), all unlabeled points may be classified to one class while still classifying the labeled data correctly, giving lower objective value. Solution :- –Introduce a balancing constraint in the objective function.

Implementations to Solve TSVM S 3 VM :- –Mixed Integer Programming. Intractable for large data sets. SVMLight-TSVM :- –Initially fixes labels of unlabeled examples and then iteratively switches those labels to improve TSVM objective function, solving convex objective function at each step. –Introduces balancing constraint. –Handles few thousand examples.

VS 3 VM:- – A concave-convex minimization approach was proposed to solve successive convex problems. –Only Linear Case with no balancing constraints. Delta-TSVM :- –Optimize TSVM by gradient descent in primal. –Needs entire Kernel matrix (for non-linear case) to be in memory, hence inefficient for large datasets. –Introduce a balancing constraint.

CCCP-TSVM:- –Concave-Convex procedure. –Non-linear extension of VS 3 VMs. –Same balancing constraint as delta-TSVM. –100-time faster than SVM-light and 50-times faster than delta-TSVM. –40 hours to solve 60,000 unlabeled example in non- linear case. Still not scalable enough. Large Scale Linear TSVMs :- –Same label switching technique as in SVM-Light, but considered multiple labels at once. –Solved in the primal formulation. –Not good for non-linear case.

Manifold-Regularization Two Stage Problem :- –Learn an embedding E.g. Laplacian Eigen-maps, Isomap or spectral clustering. –Train a Classifier in this new space. Laplacian SVM :- Laplacian Eigen Map

Using Both Approaches LDS (Low Density Separation) –First, Isomap-like embedding method of “graph”-SVM is used, whereby data is clustered. –In the new embedding space, Delta-TSVM is applied. Problems –The two-stage approach seems ad-hoc –Method is slow.

Proposed Approach Objective Function Non-Convex

Details The primal problem is solved by gradient descent. So, online semi-supervised learning is possible. For non-linear case, a multi-layer architecture is implemented. This makes training and testing faster than computing the kernel. (Hard Tanh – function is used) Also, recommendation for online balancing constraint is given.

Balancing Constraint A cache of last 25c predictions f(x i *), where c is the number of class, is preserved. Next balanced prediction is made by assuming a fixed estimate p est (y) of the probability of each class, which can be estimated from the distribution of labeled data.

One of the two decisions are made :- –Delta-bal :- Add the delta-TSVM balancing function multiplied by a scaling factor to the objective. Disadvantage of identifying optimal scaling factor. –Igonore-bal :- Based on the distribution of examples-label pairs in the cache, If the next unlabeled example has too many examples assigned to it, do not make a gradient step.

Further a smooth version of p trn can be achieved by labeling the unlabeled data by k nearest neighbors of each labeled data. We derive p knn, that can be used for implementing the balancing constraint.

Online Manifold Transduction

Experiments Data Sets Used

Test Error for Various Methods

Large Scale Datasets