SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.

Slides:

Advertisements

Similar presentations

Pattern Recognition and Machine Learning

Advertisements

January 23 rd, Document classification task We are interested to solve a task of Text Classification, i.e. to automatically assign a given document.

ECG Signal processing (2)

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

A CTION R ECOGNITION FROM V IDEO U SING F EATURE C OVARIANCE M ATRICES Kai Guo, Prakash Ishwar, Senior Member, IEEE, and Janusz Konrad, Fellow, IEEE.

Data Mining Classification: Alternative Techniques

Support Vector Machines

Machine learning continued Image source:

Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learning.

1 Fast Primal-Dual Strategies for MRF Optimization (Fast PD) Robot Perception Lab Taha Hamedani Aug 2014.

A Probabilistic Framework for Semi-Supervised Clustering

Classification and Decision Boundaries

Introduction to Boosting Slides Adapted from Che Wanxiang( 车万翔 ) at HIT, and Robin Dhamankar of Many thanks!

Fuzzy Support Vector Machines (FSVMs) Weijia Wang, Huanren Zhang, Vijendra Purohit, Aditi Gupta.

Support Vector Machines (SVMs) Chapter 5 (Duda et al.)

Semi-Supervised Classification by Low Density Separation Olivier Chapelle, Alexander Zien Student: Ran Chang.

Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Boosting Rong Jin. Inefficiency with Bagging D Bagging … D1D1 D2D2 DkDk Boostrap Sampling h1h1 h2h2 hkhk Inefficiency with boostrap sampling: Every example.

Announcements  Homework 4 is due on this Thursday (02/27/2004)  Project proposal is due on 03/02.

Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.

Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Semi-Supervised Learning Using Randomized Mincuts Avrim Blum, John Lafferty, Raja Reddy, Mugizi Rwebangira.

Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.

1 NHDC and PHDC: Local and Global Heat Diffusion Based Classifiers Haixuan Yang Group Meeting Sep 26, 2005.

Semi-Supervised Learning D. Zhou, O Bousquet, T. Navin Lan, J. Weston, B. Schokopf J. Weston, B. Schokopf Presents: Tal Babaioff.

Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.

Linear Discriminant Functions Chapter 5 (Duda et al.)

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learing.

Radial Basis Function Networks

Surface Simplification Using Quadric Error Metrics Michael Garland Paul S. Heckbert.

SVM by Sequential Minimal Optimization (SMO)

Trust-Aware Optimal Crowdsourcing With Budget Constraint Xiangyang Liu 1, He He 2, and John S. Baras 1 1 Institute for Systems Research and Department.

1 Learning with Local and Global Consistency Presented by Qiuhua Liu Duke University Machine Learning Group March 23, 2007 By Dengyong Zhou, Olivier Bousquet,

Discriminant Functions

Universit at Dortmund, LS VIII

Benk Erika Kelemen Zsolt

Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)

An Introduction to Support Vector Machines (M. Law)

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.

Transductive Regression Piloted by Inter-Manifold Relations.

Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science ＆ Information Engineering.

Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:

CISC667, F05, Lec22, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines I.

SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.

H. Lexie Yang1, Dr. Melba M. Crawford2

RSVM: Reduced Support Vector Machines Y.-J. Lee & O. L. Mangasarian First SIAM International Conference on Data Mining Chicago, April 6, 2001 University.

Bing LiuCS Department, UIC1 Chapter 8: Semi-supervised learning.

Linear Models for Classification

Presenter ： Kuang-Jui Hsu Date ： 2011/3/24(Thur.).

1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.

Prototype Classification Methods Fu Chang Institute of Information Science Academia Sinica ext. 1819

1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.

Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,

Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:

Classification Course web page: vision.cis.udel.edu/~cv May 14, 2003  Lecture 34.

Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)

Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.

Max-Confidence Boosting With Uncertainty for Visual tracking WEN GUO, LIANGLIANG CAO, TONY X. HAN, SHUICHENG YAN AND CHANGSHENG XU IEEE TRANSACTIONS ON.

Semi-Supervised Clustering

Chapter 7. Classification and Prediction

LINEAR CLASSIFIERS The Problem: Consider a two class task with ω1, ω2.

Deep Feedforward Networks

A New Boosting Algorithm Using Input-Dependent Regularizer

COSC 4335: Other Classification Techniques

Generally Discriminant Analysis

Concave Minimization for Support Vector Machine Classifiers

University of Wisconsin - Madison

Presentation transcript:

SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and Yi Liu, Student Member, IEEE Presented by Yueng-Tien,Lo Reference: SemiBoost: Boosting for Semi-supervised Learning P. K. Mallapragada, R. Jin, A. K. Jain, and Y. Liu, IEEE Transaction on Pattern Analysis and Machine Intelligence(PAMI), 31(11): , 2009

outline Introduction Related Work Semi-Supervised Boosting Results and Discussion Conclusions and Future Work Gaussian Fields and Harmonic Functions

Introduction The key idea of semi-supervised learning, specifically semi-supervised classification, is to exploit both labeled and unlabeled data to learn a classification model. There is an immense need for algorithms that can utilize the small amount of labeled data, combined with the large amount of unlabeled data to build efficient classification systems.

Introduction Existing semi-supervised classification algorithms may be classified into two categories based on their underlying assumptions. – manifold assumption – cluster assumption

Introduction manifold assumption : the data lie on a low dimensional manifold in the input space cluster assumption : the data samples with high similarity between them must share the same label.

Introduction Most semi-supervised learning approaches design specialized learning algorithms to effectively utilize both labeled and unlabeled data. We refer to this problem of improving the performance of any supervised learning algorithm using unlabeled data as Semi-supervise Improvement, to distinguish our work from the standard semi- supervised learning problems.

Introduction The key difficulties in designing SemiBoost are: – how to sample the unlabeled examples for training a new classification model at each iteration – what class labels should be assigned to the selected unlabeled examples.

Introduction One way to address the above questions is to exploit both the clustering assumption and the large margin criterion Selecting the unlabeled examples with the highest classification confidence and assign them the class labels that are predicted by the current classifier. A problem with this strategy is that the introduction of examples with predicted class labels may only help to increase the classification margin, without actually providing any novel information to the classifier.

Introduction To overcome the above problem, we propose using the pairwise similarity measurements to guide the selection of unlabeled examples at each iteration, as well as for assigning class labels to them.

Related Work

An inductive algorithm can be used to predict the labels of samples that are unseen during training (irrespective of it being labeled or unlabeled). Transductive algorithms are limited to predicting only the labels of the unlabeled samples seen during training.

Related Work A popular way to define the inconsistency between the labels of the samples and the pairwise similarities S ij is the quadratic criterion: where L is the combinatorial graph Laplacian. The task is to assign values to the unknown labels in such a way that the overall inconsistency is minimized.

Semi-Supervised Improvement denote the entire data set, including both the labeled and the unlabeled examples. – the first n l examples are labeled, given by – the imputed class labels of unlabeled examples Let denote the symmetric similarity matrix, where represents the similarity between x i and x j. – denote the submatrix of the similarity matrix ‘A’ denote the given supervised learning algorithm

Semi-Supervised Improvement The goal of semi-supervised improvement is to improve the performance of A iteratively by treating A like a black box, using the unlabeled examples and the pairwise similarity S. In the semi-supervised improvement problem, we aim to build an ensemble classifier which utilizes the unlabeled samples in the way a graph-based approach would utilize.

An outline of the SemiBoost algorithm for semi-supervised improvement Start with an empty ensemble. At each iteration, – Compute the peusdo label (and its confidence) for each unlabeled example (using existing ensemble, and the pairwise similarity). – Sample most confident pseudo labeled examples; combine them with the labeled samples and train a component classifier using the supervised learning algorithm A. – Update the ensemble by including the component classifier with an appropriate weight.

SemiBoost The unlabeled samples must be assigned labels following two main criteria: – The points with high similarity among unlabeled samples must share the same label – Those unlabeled samples which are highly similar to a labeled sample must share its label. Our objective function is a combination of two terms: – one measuring the inconsistency between labeled and unlabeled examples and the other measuring the inconsistency among the unlabeled examples

SemiBoost Inspired by the harmonic function approach, we define, the inconsistency between class labels y and the similarity measurement S, as Note that (1) can be expanded as and due to the symmetry of S

SemiBoost We have where is the hyperbolic cosine function. Rewriting (1) using the function reveals the connection between the quadratic penalty used in the graph-Laplacian-based approaches, and the exponential penalty used in the current approach

SemiBoost Using a penalty not only facilitates the derivation of boosting-based algorithms but also increases the classification margin. The inconsistency between labeled and unlabeled examples is defined as

SemiBoost Combining (1) and (3) leads to the objective function, The constant C is introduced to weight the importance between the labeled and the unlabeled data. Given the objective function in (4), the optimal class label y u is found by minimizing F.

SemiBoost The problem can now be formally expressed as The following procedure is adopted to derive the boosting algorithm: – The labels for the unlabeled samples y i u are replaced by the ensemble predictions over the corresponding data sample. – A bound-optimization-based approach is then used to find the ensemble classifier minimizing the objective function. – The bounds are simplified further to obtain the sampling scheme, and other required parameters.

The SemiBoost algorithm Compute the pairwise similarity S i,j between any two examples. Initialize H(x) = 0 For t = 1, 2,..., T – Compute p i and q i for every example using Equations (9)and(10) – Compute the class label z i = sign(p i − q i ) for each example – Sample example x i by the weight |p i − q i | – Apply the algorithm A to train a binary classifier h t (x) using the sampled examples and their class labels z i – Compute α t using Equation (11) – Update the classification function as H(x) ← H(x) + α t h t (x)

SemiBoost-Algorithm Let denote the two-class classification model that is learned at the t-th iteration by the algorithm A. Let denote the combined classification model learned after the first T iterations. where is the combination weight.

SemiBoost-Algorithm This leads to the following optimization problem: This expression involves products of variables and making it nonlinear and, hence, difficult to optimize.

Appendix

SemiBoost-Algorithm (prop.1) Minimizing (7) is equivalent to minimizing the function where and

Appendix

SemiBoost-Algorithm (prop.2) The expression in (8) is difficult to optimize since the weight and the classifier h(x) are coupled together. Minimizing (8) is equivalent to minimizing We denote the upper bound in the above equation by F 2.

SemiBoost-Algorithm (prop.3) To minimize F2, the optimal class label z i for the example x i is z i = sign(p i – q i ) and the weight for sampling example x i is |p i -q i |. The optimal that minimizes F 1 is

SemiBoost-Algorithm At each relaxation, the “touch-point” is maintained between the objective function and the upper bound. As a result, the procedure guarantees: – The objective function always decreases through iterations – The final solution converges to a local minimum.

The SemiBoost algorithm Compute the pairwise similarity S i,j between any two examples. Initialize H(x) = 0 For t = 1, 2,..., T – Compute p i and q i for every example using Equations (9)and(10) – Compute the class label z i = sign(p i − q i ) for each example – Sample example x i by the weight |p i − q i | – Apply the algorithm A to train a binary classifier h t (x) using the sampled examples and their class labels z i – Compute α t using Equation (11) – Update the classification function as H(x) ← H(x) + α t h t (x)

SemiBoost-Algorithm Let be the weighted error made by the classifier, where As in the case of AdaBoost, can be expressed as which is very similar to the weighting factor of AdaBoost, differing only by a constant factor of 1/2.

SemiBoost-Algorithm Theorem 1. Let be the combination weights that are computed by running the SemiBoost algorithm (Fig. 1). Then, the objective function at the et t 1Tst iteration, i.e., Ftt1, is bounded as follows: where

Results and Discussion

X. Zhu, Z. Ghahramani, and J. Lafferty, “Semi- supervised learning using gaussian fields and harmonic functions,” in Proc. 20t International Conference on Machine Learning, pp. 912–919, 2003