Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Overview on Semi-Supervised Learning Methods Matthias Seeger MPI for Biological Cybernetics Tuebingen, Germany.

Similar presentations


Presentation on theme: "An Overview on Semi-Supervised Learning Methods Matthias Seeger MPI for Biological Cybernetics Tuebingen, Germany."— Presentation transcript:

1 An Overview on Semi-Supervised Learning Methods Matthias Seeger MPI for Biological Cybernetics Tuebingen, Germany

2 Overview The SSL Problem Paradigms for SSL. Examples The Importance of Input-dependent Regularization Note: Citations omitted here (given in my literature review)

3 Semi-Supervised Learning SSL is Supervised Learning... Goal: Estimate P(y|x) from Labeled Data D l ={ (x i,y i ) } But: Additional Source tells about P(x) (e.g., Unlabeled Data D u ={x j }) x y   The Interesting Case:

4 Obvious Baseline Methods Do not use info about P(x)  Supervised Learning Fit a Mixture Model using Unsupervised Learning, then “label up” components using {y i } The Goal of SSL is To Do Better Not: Uniformly and always (No Free Lunch; and yes (of course): Unlabeled data can hurt) But (as always): If our modelling and algorithmic efforts reflect true problem characteristics

5 The Generative Paradigm Model Class Distributions and Implies model for P(y|x) and for P(x)   xy

6 The Joint Likelihood Natural Criterion in this context: Maximize using EM (idea as old as EM) Early and recent theoretical work on asymptotic variance Advantage: Easy to implement for standard mixture model setups

7 Drawbacks of Generative SSL Choice of source weighting crucial Cross-Validation fails for small n Homotopy Continuation (Corduneanu etal.) Just like in Supervised Learning: Model for P(y|x) specified indirectly Fitting not primarily concerned with P(y|x). Also: Have to represent P(x) generally well Not just aspects which help with P(y|x).

8 The Diagnostic Paradigm Model P(y|x,  ) and P(x|  ) directly But: Since ,  are independent a priori,  does not depend on , given data  Knowledge of  does not influence P(y|x) prediction in a probabilistic setup!   xy

9 What To Do About It Non-probabilistic diagnostic techniques Replace expected loss by Tong, Koller; Chapelle etal.  Very limited effect if n small Some old work (eg., Anderson) Drop the prior independence of ,   Input-dependent Regularization

10 Input-Dependent Regularization Conditional priors P(  |  ) make P(y|x) estimation dependent on P(x), Now, unlabeled data can really help... And can hurt for the same reason!   xy

11 The Cluster Assumption (CA) Empirical Observation: Clustering of data {x j } w.r.t. “sensible” distance / features often fairly compatible with class regions Weaker: Class regions do not tend to cut high-volume regions of P(x) Why? Ask Philosophers! My guess: Selection bias for features/distance No Matter Why: Many SSL Methods implement the CA and work fine in practice

12 Examples For IDR Using CA Label Propagation, Gaussian Random Fields: Regularization depends on graph structure which is built from all {x j }  More smoothness in regions of high connectivity / affinity flows Cluster kernels for SVM (Chapelle etal.) Information Regularization (Corduneanu, Jaakkola)

13 More Examples for IDR Some methods do IDR, but implement the CA only in special cases: Fisher Kernels (Jaakkola etal.) Kernel from Fisher features  Automatic feature induction from P(x) model Co-Training (Blum, Mitchell) Consistency across diff. views (features)

14 Is SSL Always Generative? Wait: We have to model P(x) somehow. Is this not always generative then?... No! Generative: Model P(x|y) fairly directly, P(y|x) model and effect of P(x) are implicit Diagnostic IDR: Direct model for P(y|x), more flexibility Influence of P(x) knowledge on P(y|x) prediction directly controlled, eg. through CA  Model for P(x) can be much less elaborate

15 Conclusions Given taxonomy for probabilistic approaches to SSL Illustrated paradigms by examples from literature Tried to clarify some points which have led to confusions in the past


Download ppt "An Overview on Semi-Supervised Learning Methods Matthias Seeger MPI for Biological Cybernetics Tuebingen, Germany."

Similar presentations


Ads by Google