Presentation is loading. Please wait.

Presentation is loading. Please wait.

5. Maximum Likelihood –II Prof. Yuille. Stat 231. Fall 2004.

Similar presentations


Presentation on theme: "5. Maximum Likelihood –II Prof. Yuille. Stat 231. Fall 2004."— Presentation transcript:

1 5. Maximum Likelihood –II Prof. Yuille. Stat 231. Fall 2004.

2 Topics Exponential Distributions, Sufficient Statistics, and MLE. Maximum Entropy Principle. Model Selection.

3 Exponential Distributions. Gaussians are a member of the class of exponential distribution. Parameters Statistics

4 Sufficient Statistics. The are the sufficient statistics of the distribution. Knowledge of is all we need to know about the data The rest is irrelevant. Almost all distributions can be expressed as Exponentials – Gaussian, Poisson, etc.

5 Sufficient Statistics of Gaussian One-Dimensional Gaussian and samples Sufficient statistics are And These are sufficient to learn the parameters of the distribution from data.

6 MLE for Gaussian To estimate the parameters – maximize Or equivalently, maximize: The sufficient statistics are chosen so that

7 Sufficient Statistics for Gaussian Distribution is of form: This is the same as a Gaussian with mean and variance

8 Exponential Models and MLE. MLE corresponds to maximizing Equivalent to minimizing Where

9 Exponential Models and MLE. This minimization is a convex optimization problem and hence has a unique solution. But finding this solution may be difficult. Algorithms such as Generalized Iterative Scaling are guaranteed to converge.

10 Maximum Entropy Principle. An alternative way to think of Exponential Distributions and MLE. Start with the Statistics, and then estimate the form and the parameters of the probability distribution. Using the Maximum Entropy principle.

11 Entropy The entropy of a distribution is Defined by Shannon as a measure of the information obtained by observing a sample from P(x).

12 Maximum Entropy Principle Maximum Entropy Principle. Select the distribution P(x) which maximizes the entropy subject to constraints. Lagrange multipliers The observed value of the statistics are

13 Maximum Entropy Minimize with respect to P(x). Gives the (exponential) form of the distribution: Maximizing with respect to the Lagrange parameters ensures that the constraints are satisfied:

14 Maximum Entropy. This gives the same result as MLE for Exponential Distributions. Maximum Entropy + Constraints = Exponential Distribution + MLE Parameter. The Max-Ent distribution which has the observed sufficient statistics is the exponential distribution with those statistics. Example: can obtain a Gaussian by performing Max-Ent on statistics

15 Minimax Principle. Construct a distribution incrementally by increasing the number of statistics The entropy of the Max-Ent distribution with M statistics is given by: Minimax Principle: select the statistics to minimize the entropy of the maximum entropy distribution. This relates to model selection.

16 Model Selection. Suppose we do not know which model generates the data. Two models Priors Model selection enables us to estimate which model is most likely to have generated the data

17 Model Selection. Calculate Compare with Observe that we must sum over all possible values of the model parameters

18 Model Selection & Minimax. The entropy of the Max-Ent distribution Is minus the probability of the data So the Minimax Principle is a form of model selection. But it estimates the parameters instead of summing them out.

19 Model Selection. Important Issue: Suppose the model has more parameters than Then is more flexible and can fit a larger number of data models. But summing over the parameters and penalizes this flexibility. Gives “Occam’s Razor” favoring the simpler model.

20 Model Selection. More advanced modeling requires performing model selection – where the models are complex. Beyond scope of this course.


Download ppt "5. Maximum Likelihood –II Prof. Yuille. Stat 231. Fall 2004."

Similar presentations


Ads by Google