Download presentation

Presentation is loading. Please wait.

1
**236607 Visual Recognition Tutorial**

Maximum likelihood – an example Maximum likelihood – another example Bayesian estimation EM for a mixture model EM Algorithm General Setting Jensen’s inequality Visual Recognition Tutorial

2
**Bayesian Estimation: General Theory**

Bayesian leaning considers (the parameter vector to be estimated) to be a random variable. Before we observe the data, the parameters are described by a prior which is typically very broad. Once we observed the data, we can make use of Bayes’ formula to find posterior. Since some values of the parameters are more consistent with the data than others, the posterior is narrower than prior. This is Bayesian learning Visual Recognition Tutorial

3
**Bayesian parametric estimation**

Density function for x, given the training data set (it was defined in the Lect.2) From the definition of conditional probability densities The first factor is independent of X(n) since it just our assumed form for parameterized density. Therefore Visual Recognition Tutorial

4
**Bayesian parametric estimation**

Instead of choosing a specific value for , the Bayesian approach performs a weighted average over all values of If the weighting factor , which is a posterior of peaks very sharply about some value we obtain Thus the optimal estimator is the most likely value of given the data and the prior of Visual Recognition Tutorial

5
**Bayesian decision making**

Suppose we know the distribution of possible values of that is a prior Suppose we also have a loss function which measures the penalty for estimating when actual value is Then we may formulate the estimation problem as Bayesian decision making: choose the value of which minimizes the risk Note that the loss function is usually continuous. Visual Recognition Tutorial

6
**Maximum A-Posteriori (MAP) Estimation**

Let us look at : the optimal estimator is the most likely value of q given the data and the prior of q This “most likely value” is given by Visual Recognition Tutorial

7
**Maximum A-Posteriori (MAP) Estimation**

since the data is i.i.d. We can disregard the normalizing factor when looking for the maximum Visual Recognition Tutorial

8
**236607 Visual Recognition Tutorial**

MAP - continued So, the we are looking for is Visual Recognition Tutorial

9
**236607 Visual Recognition Tutorial**

Maximum likelihood In MAP estimator, the larger n (the size of the data), the less important is in the expression It can motivate us to omit the prior. What we get is the maximum likelihood (ML) method. Informally: we don’t use any prior knowledge about the parameters; we seek those values that “explain” the data in the best way . is a log-likelihood of with respect to X(n) . We seek a maximum of the likelihood function, log-likelihood, or their monotonically increasing function. Visual Recognition Tutorial

10
**Maximum likelihood – an example**

Let us find the ML estimator for the parameter of the exponential density function : so we are actually looking for the maximum of log-likelihood. Observe: The maximum is achieved where We have got the empirical mean (average) Visual Recognition Tutorial

11
**Maximum likelihood – another example**

Let us find the ML estimator for Observe: The maximum is at where This is the median of the sampled data. Visual Recognition Tutorial

12
**Bayesian estimation -revisited**

We saw Bayesian estimator for 0/1 loss function (MAP). What happens when we assume other loss functions? Example 1: (q is unidimensional). The total Bayesian risk here: We seek its minimum: Visual Recognition Tutorial

13
**Bayesian estimation -continued**

At the which is a solution we have That is, for the the optimal Bayesian estimator for the parameter is the median of the distribution Example 2: (squared error). Total Bayesian risk: Again, in order to find the minimum, let the derivative be equal 0: Visual Recognition Tutorial

14
**Bayesian estimation -continued**

The optimal estimator here is the conditional expectation of q given the data X(n) . Visual Recognition Tutorial

15
**236607 Visual Recognition Tutorial**

Mixture Models Visual Recognition Tutorial

16
**236607 Visual Recognition Tutorial**

Mixture Models Introduce multinomial random variable Z with components Zk If and only if Zn takes kth value then Note that Visual Recognition Tutorial

17
**236607 Visual Recognition Tutorial**

Mixture Models where Marginal prob. of X is Visual Recognition Tutorial

18
**236607 Visual Recognition Tutorial**

Mixture Models A mixture model as graphical model. Z – multinomial latent variable Conditional prob. of Z. Define posterior Visual Recognition Tutorial

19
**Unconditional Mixture Models**

Cond. Mix.Mod. -> to solve regression and classification (supervised). Need observation of data X and labels Y that is (X,Y) pairs. Uncond. Mix.Mod. -> to solve density estimation problems Need only observation of data X. Applications – detection of outliers, compression, unsupervised classification (clustering) … Visual Recognition Tutorial

20
**Unconditional Mixture Models**

Visual Recognition Tutorial

21
**Gaussian Mixture Models**

Estimate from IID data D={x1,…,xN} Visual Recognition Tutorial

22
**236607 Visual Recognition Tutorial**

The K- means algorithm Group data D={x1,…,xN} into a set of K clusters, where K is given. Represents i-th cluster as one vector - its mean Data points assign to the nearest mean . Phase 1: values for the indicator variable are evaluated by assigning each point xn to the closed mean: Phase 2: recompute Visual Recognition Tutorial

23
**236607 Visual Recognition Tutorial**

EM Algorithm If Zn were observed, then it would be “class label” and estimate of mean would be We don’t know them and replace them by their conditional expectations, conditioning on data: But from (6),(7) depends on parameter estimates so we should iterate. Visual Recognition Tutorial

24
**236607 Visual Recognition Tutorial**

EM Algorithm Iteration formulas: Visual Recognition Tutorial

25
**236607 Visual Recognition Tutorial**

EM Algorithm Expectation step is (14) Maximization step is parameter updates (15)-(17) What relationship this algorithm has to quantity which we want to maximize - log likelihood (9) ? Calculating derivatives of l with respect to the parameters, we have Visual Recognition Tutorial

26
**236607 Visual Recognition Tutorial**

EM Algorithm Setting to zero yields Analogously and mixing proportions: Visual Recognition Tutorial

27
**236607 Visual Recognition Tutorial**

EM General Setting EM is iterative technique designed for probabilistic models. We have two sample spaces: X which are observed (dataset) Z which are missing (latent) A probability model is If we knew Z we would do ML estimation by maximizing Visual Recognition Tutorial

28
**236607 Visual Recognition Tutorial**

EM General Setting Z is not observed so we calculate incomplete log likelihood Given Z is not observed so the complete log likelihood is a random quantity and cannot be maximized directly. Thus we average over Z using some “averaging distribution” q(z|x). We hope that maximizing this surrogate expression will yield value of q which will be improvement of initial value of q. Visual Recognition Tutorial

29
**236607 Visual Recognition Tutorial**

EM General Setting The distribution can be used to obtain lower bound on log likelihood: EM is coordinate ascent on At the (t+1)st iteration, for fixed q(t), we first maximize with respect to q, which yield q(t+1). For this q(t+1) we then maximize with respect to q which yields q(t+1), Visual Recognition Tutorial

30
**236607 Visual Recognition Tutorial**

EM General Setting E step M step The M step is equivalently viewed as the maximization of the expected complete log likelihood. Proof: Second term is independent of q. Thus maximizing of with respect to q is equivalent to maximizing Visual Recognition Tutorial

31
**236607 Visual Recognition Tutorial**

EM General Setting The E step can be solved ones and for all: choise yields the maximum: Visual Recognition Tutorial

32
**236607 Visual Recognition Tutorial**

Jensen’s inequality Definition: function is convex over (a,b) if Convex Concave Jensen’s inequality: For convex function Visual Recognition Tutorial

33
**236607 Visual Recognition Tutorial**

Jensen’s inequality For d.r.v.with two mass points Let Jensen’s inequality is right for k-1 mass points, then due to induction assumption due to convexity Visual Recognition Tutorial

34
**Jensen’s inequality corollary**

Let Function log is concave, so from Jensen inequality we have: Visual Recognition Tutorial

Similar presentations

© 2019 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google