Bayesian Parameter Estimation Liad Serruya. Agenda Introduction Bayesian decision theory Scale-Invariant Learning Bayesian “One-Shot” Learning.

Bayesian Parameter Estimation Liad Serruya

Agenda Introduction Bayesian decision theory Scale-Invariant Learning Bayesian “One-Shot” Learning

Computer Vision Learning a new category typically requires 1000, if not 10000, of training images. The number of training examples has to be 5 to 10 times the number of object parameters => large training sets. The penalty for using small training sets is over fitting. These have to be collected, and sometimes manually segmented and aligned – a tedious and expensive task.

Humans It is believed that humans can recognize between 5000 and 30000 object categories Learning a new category is both fast and easy, sometimes requiring very few training examples. When learning a new category we take advantage of prior experience. The appearance of the categories we know and, more importantly, the variability in their appearance, gives us important information on what to expect in a new category.

Why is it hard? Given an image, decide whether or not it contains an object of a specific class. Difficulties:  Size and Intra-class variation  Background clutter  Occlusion  Scale and lighting variations Figures from http://people.csail.mit.edu/fergus/research/cm/constellation_model.html

Minimum of supervision Learn to recognize object class given a set of class and background pictures  Without preprocessing  Labeling  Segmentation  Alignment  Scale normalized  Images may contain clutter Figures from http://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/bayessian_recognition.pps

Agenda Introduction Bayesian decision theory  Maximum Likelihood (ML)  Bayesian Estimation (BE)  ML vs. BE  Expectation Maximization Algorithm (EM) Scale-Invariant Learning Bayesian “One-Shot” Learning

Bayesian Decision Theory We are given a training set X of samples of class c. Given a query image, want to know the probability it belongs to the class, We know that the class has some fixed distribution, with unknown parameters θ, that is is known Bayes rule tells us: What can we do about ?

Maximum Likelihood Estimation Concept of likelihood – Probability of an event dependent on model parameters – Likelihood of the parameters given the data

MLE cont. The aim of maximum likelihood estimation is to find the parameter values that makes the observed data most likely. Because the likelihood of the parameters given the data is defined to be equal to the probability of the data given the parameters.

Simple Example of MLE Assume that p is a certain value (0.5) Wish to find the MLE of p, given a specific dataset. This test is essentially asking: is there evidence that the coin is biased? How do we do this? We find the value for p that makes the observed data most likely.

Example cont. n = 100 (total number of tosses) h = 56 (total number of heads) 44 (tail) Imagine that p was 0.5 => We can tabulate the likelihood for different parameter values to find the maximum likelihood estimate of p: PL 0.480.0222 0.500.0389 0.520.0581 0.540.0739 0.560.0801 0.580.0738 0.600.0576 0.620.0378 Graphs from http://statgen.iop.kcl.ac.uk/bgim/mle/sslike_3.html

Maximum Likelihood Estimation What can we do about ? Choose parameter value, that make the training data most probable: This slide was taken from http://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/bayessian_recognition.pps

Bayesian Estimation The Bayesian Estimation approach considers θ as a random variable. Before we observe the training data, the parameters are described by a prior which is typically very broad. Once the data is observed, we can make use of Bayes’ formula to find posterior. Since some values of the parameters are more consistent with the data than others, the posterior is narrower than prior. This slide was taken from http://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/bayessian_recognition.pps

Bayesian Estimation Unlike ML, Bayesian estimation does not choose a specific value for, but instead performs a weighted average over all possible values of. Why is it more accurate then ML? This slide was taken from http://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/bayessian_recognition.pps

Maximal Likelihood vs Bayesian ML and Bayesian estimations are asymptotically equivalent and “consistent”. ML is typically computationally easier. ML is often easier to interpret: it returns the single best model (parameter) whereas Bayesian gives a weighted average of models. But for a finite training data (and given a reliable prior) Bayesian is more accurate (uses more of the information). Bayesian with “flat” prior is essentially ML; with asymmetric and broad priors the methods lead to different solutions. This slide was taken from http://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/bayessian_recognition.pps

Expectation Maximization (EM) EM is an iterative optimization method to estimate some unknown parameters, given measurement data, but not given some “hidden” variables. We want to maximize the posterior probability of the parameters given the data U, marginalizing over : This slide was taken from http://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/bayessian_recognition.pps

Expectation Maximization (EM) Choose an initial parameter Guess of unknown hidden data E-Step: Estimate unobserved data using M-Step: Compute Maximum Likelihood Estimate parameter using estimated data Observed Data Guess of parameters This slide was taken from http://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/bayessian_recognition.pps

Expectation Maximization (EM) Alternate between estimating the unknowns and the hidden variables J. EM algorithm converges to a local maximum This slide was taken from http://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/bayessian_recognition.pps

Agenda Introduction Bayesian decision theory Scale-Invariant Learning  Overview  Model Structure  Results Bayesian “One-Shot” Learning

Scale-Invariant Learning Object Class Recognition By Scale Invariant Learning – Proc. of the IEEE Conf on Computer Vision and Pattern Recognition, 2003 Weakly Supervised Scale-Invariant Learning of Models for Visual Recognition – International Journal of Computer Vision, in print, 2006 Rob FergusPietro PeronaA. Zisserman

Overview The goal of the research is to try and get computers to recognize different categories of object in images. The computer must be capable of learning a new category looks like. Identify new instances in a query image.

How to do it? There are three main issues we need to consider: 1. Representation - How to represent the object 2. Learning - Using this representation, how to learn a particular object category 3. Recognition - How to use the model that learned to find further instances in query images. This slide was taken from http://people.csail.mit.edu/fergus/research/cm/learning.htm

Representation We choose to model objects as a constellation of parts. By modeling the location and appearance of these consistent regions across a set of training images for a category, we obtain a model of the category itself. Figure from http://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/bayessian_recognition.pps

Notation X – Shape: A – Appearance: S – Scale: h – Hypothesis: Locations of the features Representations of the features Vector of feature scales Which part is represented by which observed feature. This slide was taken from http://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/bayessian_recognition.pps

Bayesian Decision Query image we have identified N interesting features with locations X, scales S, and appearances A. We now make a Bayesian decision, R: We apply threshold to the likelihood ratio R to decide whether the input image belongs to the class. This slide was taken from http://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/bayessian_recognition.pps

vs. Bayes rule: posterior ratio likelihood ratioprior ratio This slide was taken from http://vision.cs.princeton.edu/documents/CVPR2007_tutorial_intro.ppt

Zebra Non-zebra Decision boundary This slide was taken from http://vision.cs.princeton.edu/documents/CVPR2007_tutorial_intro.ppt

The Likelihoods The term can be factored into: We apply threshold to the likelihood ratio R to decide whether the input image belongs to the class. This slide was taken from http://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/bayessian_recognition.pps

Appearance Foreground model Clutter model Gaussian This slide was taken from http://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/bayessian_recognition.pps

Shape Foreground model Clutter model Gaussian Uniform This slide was taken from http://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/bayessian_recognition.pps

Relative Scale Gaussian Log (scale) Uniform Log (scale) Foreground model Clutter model This slide was taken from http://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/bayessian_recognition.pps

Detection Probability Foreground model Clutter model Probability of detection 0.80.750.9 Poisson probability density function on the number of detections This slide was taken from http://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/bayessian_recognition.pps

Feature Detection Kadir-Brady feature detector Detects salient regions over different scales and locations Choose N most salient regions Each feature contains scale and location information This slide was taken from http://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/bayessian_recognition.pps

Feature Representation Normalization Feature contents is rescaled to a 11x11 pixel patch Reduce data dimension from 121 to 15 dimensions using PCA method Result is the appearance vector for the part This slide was taken from http://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/bayessian_recognition.pps

Learning Want to estimate model parameters: Using EM method find that will best explain the training set images, i.e. maximize the likelihood: This slide was taken from http://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/bayessian_recognition.pps

Learning procedure Find regions, their location, scale & appearance over all training Initialize model parameters Use EM and iterate to convergence:  E-step: Compute assignments for which regions are foreground / background  M-step: Update model parameters Trying to maximize likelihood – consistency in shape & appearance This info was taken from http://people.csail.mit.edu/fergus/research/cm/learning.htm

Illustrates This slide was taken from http://people.csail.mit.edu/fergus/research/cm/learning.htm

Recognition Recognition proceeds in the same manner as learning Take a query image Find the salient regions new image salient regions Figures from http://people.csail.mit.edu/fergus/research/cm/learning.htm

Recognition (continue) Take the model training, lots of motorbike images Find the assignment of regions that fits the model best. Figures from http://people.csail.mit.edu/fergus/research/cm/learning.htm

Experiments Some Samples Figures from http://people.csail.mit.edu/fergus/research/cm/learning.htm

Sample Model – Motorbikes Figures from http://people.csail.mit.edu/fergus/research/cm/learning.htm

Background images evaluated Figures from http://people.csail.mit.edu/fergus/research/cm/learning.htm

Sample Model – Faces Figures from http://people.csail.mit.edu/fergus/research/cm/learning.htm

Sample Model – Spotted Cats Figures from http://people.csail.mit.edu/fergus/research/cm/learning.htm

Sample Model – Airplanes Figures from http://people.csail.mit.edu/fergus/research/cm/learning.htm

Sample Model – Cars from rear Figures from http://people.csail.mit.edu/fergus/research/cm/learning.htm

Confusion Table How good is a model for object class A is for distinguishing images of class B from background images? This slide was taken from http://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/bayessian_recognition.pps

Comparison of Results Results for scale-invariant learning/recognition: Comparison to other methods: This slide was taken from http://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/bayessian_recognition.pps

Agenda Introduction Bayesian decision theory Scale-Invariant Learning Bayesian “One-Shot” Learning  Bayesian Framework  Experiments  Results  Summery

Bayesian “One-Shot” Learning A Bayesian Approach to Unsupervised One-Shot Learning of Object Categories – Proc. ICCV. 2003 Rob FergusPietro Perona Li Fei-Fei

Prior knowledge about objects This slide was taken from http://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/bayessian_recognition.pps

Incorporating prior knowledge Bayesian methods allow us to use a “prior” information p(θ) about the nature of objects. Given the new observations we can update our knowledge into a “posterior” p(θ|x) This slide was taken from http://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/bayessian_recognition.pps

Bayesian Framework Given test image, we want to make a Bayesian decision by comparing: P(object | test, train) P(test | object, train) p(Object) ∫P(test | θ, object) p(θ | object, train) dθ Bayes Rule Expansion by parametrization vs. P(clutter | test, train) Previous Work: This slide was taken from http://www.robots.ox.ac.uk/~awf/iccv03videos/

Bayesian Framework Given test image, we want to make a Bayesian decision by comparing: P(object | test, train) P(test | object, train) p(Object) ∫P(test | θ, object) p(θ | object, train) dθ Bayes Rule Expansion by parametrization vs. P(clutter | test, train) One-Shot learning: P(train | θ, object)p(θ) This slide was taken from http://www.robots.ox.ac.uk/~awf/iccv03videos/

Maximum Likelihood vs. Bayesian Learning Maximum Likelihood Bayesian Learning This slide was taken from http://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/bayessian_recognition.pps

Experimental setup Learn three object categories using ML approach and Bayesian learn models using both Bayesian and ML approaches and evaluate their performance on the test set Estimate the prior hyper-parameters Use VBEM to learn new object category from few images This slide was taken from http://www.wisdom.weizmann.ac.il/~deniss/vision_spring04/files/bayessian_recognition.pps

Dataset Images Graphs from “A Bayesian Approach to Unsupervised One-Shot Learning of Object Categories”

Experiments Prior distr. Learning new category using EM method This slide was taken from http://www.robots.ox.ac.uk/~awf/iccv03videos/

Prior Hyper-Parameters Graphs from “A Bayesian Approach to Unsupervised One-Shot Learning of Object Categories”

Performance Results – Motorbikes 1 training image 5 training images Graphs from “A Bayesian Approach to Unsupervised One-Shot Learning of Object Categories”

Performance Results – Motorbikes Graphs from “A Bayesian Approach to Unsupervised One-Shot Learning of Object Categories”

Performance Results – Face Model 1 training image 5 training images Graphs from “A Bayesian Approach to Unsupervised One-Shot Learning of Object Categories”

Performance Results – Face Model Graphs from “A Bayesian Approach to Unsupervised One-Shot Learning of Object Categories”

Results Comparison Algorithm# training imagesLearning speedError rate Burl, et al. Weber, et al. Fergus, et al. 200 ~ 400Hours5.6% – 10% Bayesian One-Shot1 ~ 5< 1 min8% – 15% This slide was taken from http://www.robots.ox.ac.uk/~awf/iccv03videos/

Summery Learning categories with one example is possible Decreased # of training example from ~300 to o1~5 Bayesian treatment Priors from unrelated categories are useful This slide was taken from http://www.robots.ox.ac.uk/~awf/iccv03videos/

References Object Class Recognition By Scale Invariant Learning – Fergus, Perona, Zisserman – 2003 Weakly Supervised Scale-Invariant Learning of Models for Visual Recognition – Fergus, Perona, Zisserman – 2005 A Bayesian Approach to Unsupervised One-Shot Learning of Object Categories - Fei-Fei, Fergus, Perona - 2003 Unsupervised Learning of Models for Recognition – Weber, Welling, Perona – 2000 Moshe Blank Ita Lifshitz (Wizmann) – slides http://people.csail.mit.edu/fergus/research/cm/constellation_ model.html#experiments http://people.csail.mit.edu/fergus/research/cm/constellation_ model.html#experiments Pattern Classification / Dua & Hart Chapt. 3 http://www.robots.ox.ac.uk/~awf/iccv03videos/

Binomial Probability Distribution n = total number of coin tosses h = number of heads obtained p = probability of obtaining a head on any one toss http://statgen.iop.kcl.ac.uk/bgim/mle/sslike_3.html

Bayesian Parameter Estimation Liad Serruya. Agenda Introduction Bayesian decision theory Scale-Invariant Learning Bayesian “One-Shot” Learning.

Similar presentations

Presentation on theme: "Bayesian Parameter Estimation Liad Serruya. Agenda Introduction Bayesian decision theory Scale-Invariant Learning Bayesian “One-Shot” Learning."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Bayesian Parameter Estimation Liad Serruya. Agenda Introduction Bayesian decision theory Scale-Invariant Learning Bayesian “One-Shot” Learning.

Similar presentations

Presentation on theme: "Bayesian Parameter Estimation Liad Serruya. Agenda Introduction Bayesian decision theory Scale-Invariant Learning Bayesian “One-Shot” Learning."— Presentation transcript:

Similar presentations

About project

Feedback