Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture note for Stat 231: Pattern Recognition and Machine Learning 4. Maximum Likelihood Prof. A.L. Yuille Stat 231. Fall 2004.

Similar presentations


Presentation on theme: "Lecture note for Stat 231: Pattern Recognition and Machine Learning 4. Maximum Likelihood Prof. A.L. Yuille Stat 231. Fall 2004."— Presentation transcript:

1 Lecture note for Stat 231: Pattern Recognition and Machine Learning 4. Maximum Likelihood Prof. A.L. Yuille Stat 231. Fall 2004.

2 Lecture note for Stat 231: Pattern Recognition and Machine Learning Learning Probability Distributions. Learn the likelihood functions and priors from datasets. Two Main Strategies. Parametric and Non-Parametric. This Lecture and the next will concentrate on Parametric methods. (This assumes a parametric form for the distributions).

3 Lecture note for Stat 231: Pattern Recognition and Machine Learning Maximum Likelihood Estimation. Assume distribution is of form Independent Identically Distributed (I.I.D.) samples; Choose

4 Lecture note for Stat 231: Pattern Recognition and Machine Learning Supervised versus Unsupervised Learning. Supervised Learning assumes that we known the class label for each datapoint. I.e. We are given pairs where is the datapoint and is the class label. Unsupervised Learning does not assume that the class labels are specified. This is a harder task. But “unsupervised methods” can also be used for supervised data if the goal is to determine structure in the data (e.g. mixture of Gaussians). Stat 231 is almost entirely concerned with supervised learning.

5 Lecture note for Stat 231: Pattern Recognition and Machine Learning Example of MLE. One-Dimensional Gaussian Distribution: Solve for by differentiation:

6 Lecture note for Stat 231: Pattern Recognition and Machine Learning MLE The Gaussian is unusual because the parameters of the distribution can be expressed as an analytic expression of the data. More usually, algorithms are required. Modeling problem: for complicated patterns – shape of fish, natural language, etc. – it requires considerable work to find a suitable parametric form for the probability distributions.

7 Lecture note for Stat 231: Pattern Recognition and Machine Learning MLE and Kullback-Leibler What happens if the data is not generated by the model that we assume? Suppose the true distribution is and our models are of form The Kullback-Leiber divergence is: This is K-L is a measure of the difference between

8 Lecture note for Stat 231: Pattern Recognition and Machine Learning MLE and Kullback-Leibler Samples Approximate By the empirical KL: Minimizing the empirical KL is equivalent to MLE. We find the distribution of form

9 Lecture note for Stat 231: Pattern Recognition and Machine Learning MLE example We denote the log-likelihood as a function of   is computed by solving equations For example, the Gaussian family gives close form solution.

10 Lecture note for Stat 231: Pattern Recognition and Machine Learning Learning with a Prior. We can put a prior on the parameter values We can estimate this recursively (if samples are i.i.d): Bayes Learning: estimate a probability distribution on

11 Lecture note for Stat 231: Pattern Recognition and Machine Learning Recursive Bayes Learning


Download ppt "Lecture note for Stat 231: Pattern Recognition and Machine Learning 4. Maximum Likelihood Prof. A.L. Yuille Stat 231. Fall 2004."

Similar presentations


Ads by Google