Presentation is loading. Please wait.

Presentation is loading. Please wait.

MSRC Summer School - 30/06/2009 Cambridge – UK Hybrids of generative and discriminative methods for machine learning.

Similar presentations


Presentation on theme: "MSRC Summer School - 30/06/2009 Cambridge – UK Hybrids of generative and discriminative methods for machine learning."— Presentation transcript:

1 MSRC Summer School - 30/06/2009 Cambridge – UK Hybrids of generative and discriminative methods for machine learning

2 Motivation Generative models prior knowledge handle missing data such as labels Discriminative models perform well at classification However no straightforward way to combine them

3 Content Generative and discriminative methods A principled hybrid framework Study of the properties on a toy example Influence of the amount of labelled data

4 Content Generative and discriminative methods A principled hybrid framework Study of the properties on a toy example Influence of the amount of labelled data

5 Generative methods Answer: “what does a cat look like? and a dog?” => data and labels joint distribution x : data c : label  : parameters

6 Generative methods Objective function: G(  ) = p(  ) p(X, C|  ) G(  ) = p(  )  n p(x n, c n |  ) 1 reusable model per class, can deal with incomplete data Example: GMMs

7 Example of generative model

8 Discriminative methods Answer: “is it a cat or a dog?” => labels posterior distribution x : data c : label  : parameters

9 Discriminative methods The objective function is D(  ) = p(  ) p(C|X,  ) D(  ) = p(  )  n p(c n |x n,  ) Focus on regions of ambiguity, make faster predictions Example: neural networks, SVMs

10 Example of discriminative model SVMs / NNs

11 Generative versus discriminative No effect of the double mode on the decision boundary

12 Content Generative and discriminative methods A principled hybrid framework Study of the properties on a toy example Influence of the amount of labelled data

13 Semi-supervised learning Few labelled data / lots of unlabelled data Discriminative methods overfit, generative models only help classify if they are “good” Need to have the modelling power of generative models while performing at discriminating => hybrid models

14 Discriminative training Bach et al, ICASSP 05 Discriminative objective function: D(  ) = p(  )  n p(c n |x n,  ) Using a generative model: D(  ) = p(  )  n [ p(x n, c n |  ) / p(x n |  ) ] D(  ) = p(  )  n  c p(x n, c|  ) p(x n, c n |  )

15 Convex combination Bouchard et al, COMPSTAT 04 Generative objective function: G(  ) = p(  )  n p(x n, c n |  ) Discriminative objective function: D(  ) = p(  )  n p(c n |x n,  ) Convex combination: log L(  ) =   log D(  ) + (1-  )  log G(  )  [0,1]

16 A principled hybrid model

17

18

19

20  - posterior distribution of the labels  ’- marginal distribution of the data  and  ’ communicate through a prior Hybrid objective function: L( ,  ’) = p( ,  ’)   n p(c n |x n,  )   n p(x n |  ’)

21 A principled hybrid model  =  ’ => p( ,  ’) = p(  )  (  -  ’) L( ,  ’) = p(  )  (  -  ’)  n p(c n |x n,  )  n p(x n |  ’) L(  ) = G(  ) generative case    ’ => p( ,  ’) = p(  ) p(  ’) L( ,  ’) = [ p(  )  n p(c n |x n,  ) ]  [ p(  ’)  n p(x n |  ’) ] L( ,  ’) = D(  )  f(  ’) discriminative case

22 A principled hybrid model Anything in between – hybrid case Choice of prior: p( ,  ’) = p(  ) N(  ’| ,  (  ))      0 =>  =  ’   1 =>    =>    ’

23 Why principled? Consistent with the likelihood of graphical models => one way to train a system Everything can now be modelled => potential to be Bayesian Potential to learn 

24 Learning EM / Laplace approximation / MCMC either intractable or too slow Conjugate gradients flexible, easy to check BUT sensitive to initialisation, slow Variational inference

25 Content Generative and discriminative methods A principled hybrid framework Study of the properties on a toy example Influence of the amount of labelled data

26 Toy example

27 2 elongated distributions Only spherical gaussians allowed => wrong model 2 labelled points per class => strong risk of overfitting

28 Toy example

29 Decision boundaries

30 Content Generative and discriminative methods A principled hybrid framework Study of the properties on a toy example Influence of the amount of labelled data

31 A real example Images are a special case, as they contain several features each 2 levels of supervision: at the image level, and at the feature level Image label only => weakly labelled Image label + segmentation => fully labelled

32 The underlying generative model gaussian multinomial

33 The underlying generative model weakly – fully labelled

34 Experimental set-up 3 classes: bikes, cows, sheep : 1 Gaussian per class => poor generative model 75 training images for each category

35 HF framework

36 HF versus CC

37 Results When increasing the proportion of fully labelled data, the trend is: generative  hybrid  discriminative Weakly labelled data has little influence on the trend With sufficient fully labelled data, HF tends to perform better than CC

38 Experimental set-up 3 classes: lions, tigers and cheetahs : 1 Gaussian per class => poor generative model 75 training images for each category

39 HF framework

40 HF versus CC

41 Results Hybrid models consistently perform better However, generative and discriminative models haven’t reached saturation No clear difference between HF and CC

42 Conclusion Principled hybrid framework Possibility to learn the best trade-off Helps for ambiguous datasets when labelled data is scarce Problem of optimisation

43 Future avenues Bayesian version (posterior distribution of  ) under study Replace  by a diagonal matrix  to allow flexibility => need for the Bayesian version Choice of priors

44 Thank you!


Download ppt "MSRC Summer School - 30/06/2009 Cambridge – UK Hybrids of generative and discriminative methods for machine learning."

Similar presentations


Ads by Google