Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bayesian Parameter Estimation Liad Serruya. Agenda Introduction Bayesian decision theory Scale-Invariant Learning Bayesian “One-Shot” Learning.

Similar presentations


Presentation on theme: "Bayesian Parameter Estimation Liad Serruya. Agenda Introduction Bayesian decision theory Scale-Invariant Learning Bayesian “One-Shot” Learning."— Presentation transcript:

1 Bayesian Parameter Estimation Liad Serruya

2 Agenda Introduction Bayesian decision theory Scale-Invariant Learning Bayesian “One-Shot” Learning

3 Computer Vision Learning a new category typically requires 1000, if not 10000, of training images. The number of training examples has to be 5 to 10 times the number of object parameters => large training sets. The penalty for using small training sets is over fitting. These have to be collected, and sometimes manually segmented and aligned – a tedious and expensive task.

4 Humans It is believed that humans can recognize between 5000 and 30000 object categories Learning a new category is both fast and easy, sometimes requiring very few training examples. When learning a new category we take advantage of prior experience. The appearance of the categories we know and, more importantly, the variability in their appearance, gives us important information on what to expect in a new category.

5 Why is it hard? Given an image, decide whether or not it contains an object of a specific class. Difficulties:  Size and Intra-class variation  Background clutter  Occlusion  Scale and lighting variations Figures from http://people.csail.mit.edu/fergus/research/cm/constellation_model.html

6 Minimum of supervision Learn to recognize object class given a set of class and background pictures  Without preprocessing  Labeling  Segmentation  Alignment  Scale normalized  Images may contain clutter Figures from http://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/bayessian_recognition.pps

7 Agenda Introduction Bayesian decision theory  Maximum Likelihood (ML)  Bayesian Estimation (BE)  ML vs. BE  Expectation Maximization Algorithm (EM) Scale-Invariant Learning Bayesian “One-Shot” Learning

8 Bayesian Decision Theory We are given a training set X of samples of class c. Given a query image, want to know the probability it belongs to the class, We know that the class has some fixed distribution, with unknown parameters θ, that is is known Bayes rule tells us: What can we do about ?

9 Maximum Likelihood Estimation Concept of likelihood – Probability of an event dependent on model parameters – Likelihood of the parameters given the data

10 MLE cont. The aim of maximum likelihood estimation is to find the parameter values that makes the observed data most likely. Because the likelihood of the parameters given the data is defined to be equal to the probability of the data given the parameters.

11 Simple Example of MLE Assume that p is a certain value (0.5) Wish to find the MLE of p, given a specific dataset. This test is essentially asking: is there evidence that the coin is biased? How do we do this? We find the value for p that makes the observed data most likely.

12 Example cont. n = 100 (total number of tosses) h = 56 (total number of heads) 44 (tail) Imagine that p was 0.5 => We can tabulate the likelihood for different parameter values to find the maximum likelihood estimate of p: PL 0.480.0222 0.500.0389 0.520.0581 0.540.0739 0.560.0801 0.580.0738 0.600.0576 0.620.0378 Graphs from http://statgen.iop.kcl.ac.uk/bgim/mle/sslike_3.html

13 Maximum Likelihood Estimation What can we do about ? Choose parameter value, that make the training data most probable: This slide was taken from http://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/bayessian_recognition.pps

14 Bayesian Estimation The Bayesian Estimation approach considers θ as a random variable. Before we observe the training data, the parameters are described by a prior which is typically very broad. Once the data is observed, we can make use of Bayes’ formula to find posterior. Since some values of the parameters are more consistent with the data than others, the posterior is narrower than prior. This slide was taken from http://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/bayessian_recognition.pps

15 Bayesian Estimation Unlike ML, Bayesian estimation does not choose a specific value for, but instead performs a weighted average over all possible values of. Why is it more accurate then ML? This slide was taken from http://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/bayessian_recognition.pps

16 Maximal Likelihood vs Bayesian ML and Bayesian estimations are asymptotically equivalent and “consistent”. ML is typically computationally easier. ML is often easier to interpret: it returns the single best model (parameter) whereas Bayesian gives a weighted average of models. But for a finite training data (and given a reliable prior) Bayesian is more accurate (uses more of the information). Bayesian with “flat” prior is essentially ML; with asymmetric and broad priors the methods lead to different solutions. This slide was taken from http://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/bayessian_recognition.pps

17 Expectation Maximization (EM) EM is an iterative optimization method to estimate some unknown parameters, given measurement data, but not given some “hidden” variables. We want to maximize the posterior probability of the parameters given the data U, marginalizing over : This slide was taken from http://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/bayessian_recognition.pps

18 Expectation Maximization (EM) Choose an initial parameter Guess of unknown hidden data E-Step: Estimate unobserved data using M-Step: Compute Maximum Likelihood Estimate parameter using estimated data Observed Data Guess of parameters This slide was taken from http://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/bayessian_recognition.pps

19 Expectation Maximization (EM) Alternate between estimating the unknowns and the hidden variables J. EM algorithm converges to a local maximum This slide was taken from http://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/bayessian_recognition.pps

20 Agenda Introduction Bayesian decision theory Scale-Invariant Learning  Overview  Model Structure  Results Bayesian “One-Shot” Learning

21 Scale-Invariant Learning Object Class Recognition By Scale Invariant Learning – Proc. of the IEEE Conf on Computer Vision and Pattern Recognition, 2003 Weakly Supervised Scale-Invariant Learning of Models for Visual Recognition – International Journal of Computer Vision, in print, 2006 Rob FergusPietro PeronaA. Zisserman

22 Overview The goal of the research is to try and get computers to recognize different categories of object in images. The computer must be capable of learning a new category looks like. Identify new instances in a query image.

23 How to do it? There are three main issues we need to consider: 1. Representation - How to represent the object 2. Learning - Using this representation, how to learn a particular object category 3. Recognition - How to use the model that learned to find further instances in query images. This slide was taken from http://people.csail.mit.edu/fergus/research/cm/learning.htm

24 Representation We choose to model objects as a constellation of parts. By modeling the location and appearance of these consistent regions across a set of training images for a category, we obtain a model of the category itself. Figure from http://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/bayessian_recognition.pps

25 Notation X – Shape: A – Appearance: S – Scale: h – Hypothesis: Locations of the features Representations of the features Vector of feature scales Which part is represented by which observed feature. This slide was taken from http://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/bayessian_recognition.pps

26 Bayesian Decision Query image we have identified N interesting features with locations X, scales S, and appearances A. We now make a Bayesian decision, R: We apply threshold to the likelihood ratio R to decide whether the input image belongs to the class. This slide was taken from http://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/bayessian_recognition.pps

27 vs. Bayes rule: posterior ratio likelihood ratioprior ratio This slide was taken from http://vision.cs.princeton.edu/documents/CVPR2007_tutorial_intro.ppt

28 Zebra Non-zebra Decision boundary This slide was taken from http://vision.cs.princeton.edu/documents/CVPR2007_tutorial_intro.ppt

29 The Likelihoods The term can be factored into: We apply threshold to the likelihood ratio R to decide whether the input image belongs to the class. This slide was taken from http://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/bayessian_recognition.pps

30 Appearance Foreground model Clutter model Gaussian This slide was taken from http://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/bayessian_recognition.pps

31 Shape Foreground model Clutter model Gaussian Uniform This slide was taken from http://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/bayessian_recognition.pps

32 Relative Scale Gaussian Log (scale) Uniform Log (scale) Foreground model Clutter model This slide was taken from http://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/bayessian_recognition.pps

33 Detection Probability Foreground model Clutter model Probability of detection 0.80.750.9 Poisson probability density function on the number of detections This slide was taken from http://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/bayessian_recognition.pps

34 Feature Detection Kadir-Brady feature detector Detects salient regions over different scales and locations Choose N most salient regions Each feature contains scale and location information This slide was taken from http://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/bayessian_recognition.pps

35 Feature Representation Normalization Feature contents is rescaled to a 11x11 pixel patch Reduce data dimension from 121 to 15 dimensions using PCA method Result is the appearance vector for the part This slide was taken from http://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/bayessian_recognition.pps

36 Learning Want to estimate model parameters: Using EM method find that will best explain the training set images, i.e. maximize the likelihood: This slide was taken from http://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/bayessian_recognition.pps

37 Learning procedure Find regions, their location, scale & appearance over all training Initialize model parameters Use EM and iterate to convergence:  E-step: Compute assignments for which regions are foreground / background  M-step: Update model parameters Trying to maximize likelihood – consistency in shape & appearance This info was taken from http://people.csail.mit.edu/fergus/research/cm/learning.htm

38 Illustrates This slide was taken from http://people.csail.mit.edu/fergus/research/cm/learning.htm

39 Illustrates This slide was taken from http://people.csail.mit.edu/fergus/research/cm/learning.htm

40 Illustrates This slide was taken from http://people.csail.mit.edu/fergus/research/cm/learning.htm

41 Recognition Recognition proceeds in the same manner as learning Take a query image Find the salient regions new image salient regions Figures from http://people.csail.mit.edu/fergus/research/cm/learning.htm

42 Recognition (continue) Take the model training, lots of motorbike images Find the assignment of regions that fits the model best. Figures from http://people.csail.mit.edu/fergus/research/cm/learning.htm

43 Experiments Some Samples Figures from http://people.csail.mit.edu/fergus/research/cm/learning.htm

44 Sample Model – Motorbikes Figures from http://people.csail.mit.edu/fergus/research/cm/learning.htm

45 Background images evaluated Figures from http://people.csail.mit.edu/fergus/research/cm/learning.htm

46 Sample Model – Faces Figures from http://people.csail.mit.edu/fergus/research/cm/learning.htm

47 Sample Model – Spotted Cats Figures from http://people.csail.mit.edu/fergus/research/cm/learning.htm

48 Sample Model – Airplanes Figures from http://people.csail.mit.edu/fergus/research/cm/learning.htm

49 Sample Model – Cars from rear Figures from http://people.csail.mit.edu/fergus/research/cm/learning.htm

50 Confusion Table How good is a model for object class A is for distinguishing images of class B from background images? This slide was taken from http://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/bayessian_recognition.pps

51 Comparison of Results Results for scale-invariant learning/recognition: Comparison to other methods: This slide was taken from http://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/bayessian_recognition.pps

52 Agenda Introduction Bayesian decision theory Scale-Invariant Learning Bayesian “One-Shot” Learning  Bayesian Framework  Experiments  Results  Summery

53 Bayesian “One-Shot” Learning A Bayesian Approach to Unsupervised One-Shot Learning of Object Categories – Proc. ICCV. 2003 Rob FergusPietro Perona Li Fei-Fei

54 Prior knowledge about objects This slide was taken from http://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/bayessian_recognition.pps

55 Incorporating prior knowledge Bayesian methods allow us to use a “prior” information p(θ) about the nature of objects. Given the new observations we can update our knowledge into a “posterior” p(θ|x) This slide was taken from http://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/bayessian_recognition.pps

56 Bayesian Framework Given test image, we want to make a Bayesian decision by comparing: P(object | test, train) P(test | object, train) p(Object) ∫P(test | θ, object) p(θ | object, train) dθ Bayes Rule Expansion by parametrization vs. P(clutter | test, train) Previous Work: This slide was taken from http://www.robots.ox.ac.uk/~awf/iccv03videos/

57 Bayesian Framework Given test image, we want to make a Bayesian decision by comparing: P(object | test, train) P(test | object, train) p(Object) ∫P(test | θ, object) p(θ | object, train) dθ Bayes Rule Expansion by parametrization vs. P(clutter | test, train) One-Shot learning: P(train | θ, object)p(θ) This slide was taken from http://www.robots.ox.ac.uk/~awf/iccv03videos/

58 Maximum Likelihood vs. Bayesian Learning Maximum Likelihood Bayesian Learning This slide was taken from http://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/bayessian_recognition.pps

59 Experimental setup Learn three object categories using ML approach and Bayesian learn models using both Bayesian and ML approaches and evaluate their performance on the test set Estimate the prior hyper-parameters Use VBEM to learn new object category from few images This slide was taken from http://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/bayessian_recognition.pps

60 Dataset Images Graphs from “A Bayesian Approach to Unsupervised One-Shot Learning of Object Categories”

61 Experiments Prior distr. Learning new category using EM method This slide was taken from http://www.robots.ox.ac.uk/~awf/iccv03videos/

62 Prior Hyper-Parameters Graphs from “A Bayesian Approach to Unsupervised One-Shot Learning of Object Categories”

63 Performance Results – Motorbikes 1 training image 5 training images Graphs from “A Bayesian Approach to Unsupervised One-Shot Learning of Object Categories”

64 Performance Results – Motorbikes Graphs from “A Bayesian Approach to Unsupervised One-Shot Learning of Object Categories”

65 Performance Results – Face Model 1 training image 5 training images Graphs from “A Bayesian Approach to Unsupervised One-Shot Learning of Object Categories”

66 Performance Results – Face Model Graphs from “A Bayesian Approach to Unsupervised One-Shot Learning of Object Categories”

67 Results Comparison Algorithm# training imagesLearning speedError rate Burl, et al. Weber, et al. Fergus, et al. 200 ~ 400Hours5.6% – 10% Bayesian One-Shot1 ~ 5< 1 min8% – 15% This slide was taken from http://www.robots.ox.ac.uk/~awf/iccv03videos/

68 Summery Learning categories with one example is possible Decreased # of training example from ~300 to o1~5 Bayesian treatment Priors from unrelated categories are useful This slide was taken from http://www.robots.ox.ac.uk/~awf/iccv03videos/

69 References Object Class Recognition By Scale Invariant Learning – Fergus, Perona, Zisserman – 2003 Weakly Supervised Scale-Invariant Learning of Models for Visual Recognition – Fergus, Perona, Zisserman – 2005 A Bayesian Approach to Unsupervised One-Shot Learning of Object Categories - Fei-Fei, Fergus, Perona - 2003 Unsupervised Learning of Models for Recognition – Weber, Welling, Perona – 2000 Moshe Blank Ita Lifshitz (Wizmann) – slides http://people.csail.mit.edu/fergus/research/cm/constellation_ model.html#experiments http://people.csail.mit.edu/fergus/research/cm/constellation_ model.html#experiments Pattern Classification / Dua & Hart Chapt. 3 http://www.robots.ox.ac.uk/~awf/iccv03videos/

70 Binomial Probability Distribution n = total number of coin tosses h = number of heads obtained p = probability of obtaining a head on any one toss http://statgen.iop.kcl.ac.uk/bgim/mle/sslike_3.html


Download ppt "Bayesian Parameter Estimation Liad Serruya. Agenda Introduction Bayesian decision theory Scale-Invariant Learning Bayesian “One-Shot” Learning."

Similar presentations


Ads by Google