240-650: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha ac.th/~montri.

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
CS479/679 Pattern Recognition Dr. George Bebis
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
2 – In previous chapters: – We could design an optimal classifier if we knew the prior probabilities P(wi) and the class- conditional probabilities P(x|wi)
LECTURE 11: BAYESIAN PARAMETER ESTIMATION
Visual Recognition Tutorial
Parameter Estimation: Maximum Likelihood Estimation Chapter 3 (Duda et al.) – Sections CS479/679 Pattern Recognition Dr. George Bebis.
Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Maximum likelihood (ML) and likelihood ratio (LR) test
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
Pattern Classification, Chapter 3 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P.
Maximum likelihood (ML) and likelihood ratio (LR) test
Visual Recognition Tutorial
1 lBayesian Estimation (BE) l Bayesian Parameter Estimation: Gaussian Case l Bayesian Parameter Estimation: General Estimation l Problems of Dimensionality.
Introduction to Bayesian Parameter Estimation
Lecture 7 1 Statistics Statistics: 1. Model 2. Estimation 3. Hypothesis test.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Chapter 3 (part 1): Maximum-Likelihood & Bayesian Parameter Estimation  Introduction  Maximum-Likelihood Estimation  Example of a Specific Case  The.
Maximum likelihood (ML)
Maximum Likelihood Estimation
Chapter Two Probability Distributions: Discrete Variables
0 Pattern Classification, Chapter 3 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda,
Principles of Pattern Recognition
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
Consistency An estimator is a consistent estimator of θ, if , i.e., if
Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 07: BAYESIAN ESTIMATION (Cont.) Objectives:
Confidence Interval & Unbiased Estimator Review and Foreword.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Maximum Likelihood Estimation
1 Parameter Estimation Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of Networking and Multimedia,
CHAPTER 9 Inference: Estimation The essential nature of inferential statistics, as verses descriptive statistics is one of knowledge. In descriptive statistics,
M.Sc. in Economics Econometrics Module I Topic 4: Maximum Likelihood Estimation Carol Newman.
Univariate Gaussian Case (Cont.)
Week 21 Order Statistics The order statistics of a set of random variables X 1, X 2,…, X n are the same random variables arranged in increasing order.
G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.
Chapter 8 Estimation ©. Estimator and Estimate estimator estimate An estimator of a population parameter is a random variable that depends on the sample.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Learning Theory Reza Shadmehr Distribution of the ML estimates of model parameters Signal dependent noise models.
Pattern Classification All materials in these slides* were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Computational Intelligence: Methods and Applications Lecture 26 Density estimation, Expectation Maximization. Włodzisław Duch Dept. of Informatics, UMK.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Mixture Densities Maximum Likelihood Estimates.
Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Univariate Gaussian Case (Cont.)
CS479/679 Pattern Recognition Dr. George Bebis
Chapter 3: Maximum-Likelihood Parameter Estimation
LECTURE 06: MAXIMUM LIKELIHOOD ESTIMATION
Probability Theory and Parameter Estimation I
Ch3: Model Building through Regression
Parameter Estimation 主講人:虞台文.
Maximum Likelihood Estimation
Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)
Outline Parameter estimation – continued Non-parametric methods.
Course Outline MODEL INFORMATION COMPLETE INCOMPLETE
EC 331 The Theory of and applications of Maximum Likelihood Method
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.
LECTURE 09: BAYESIAN LEARNING
LECTURE 07: BAYESIAN ESTIMATION
Parametric Methods Berlin Chen, 2005 References:
Learning From Observed Data
Data Exploration and Pattern Recognition © R. El-Yaniv
Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)
Presentation transcript:

: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha ac.th/~montri Principles of Pattern Recognition

: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 2 Chapter 3 Maximum-Likelihood and Bayesian Parameter Estimation

: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 3 Introduction We could design an optimum classifier if we know P(  i ) and p(x|  i ) We rarely have knowledge about the probabilistic structure of the problem We often estimate P(  i ) and p(x|  i ) from training data or design samples

: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 4 Maximum-Likelihood Estimation ML Estimation Always have good convergence properties as the number of training samples increases Simpler that other methods

: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 5 The General Principle Suppose we separate a collection of samples according to class so that we have c data sets, D 1, …, D c with the samples in D j having been drawn independently according to the probability law p(x|  j ) We say such samples are i.i.d.– independently and identically distributed random variable

: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 6 The General Principle We assume that p(x|  j ) has a known parametric form and is determined uniquely by the value of a parameter vector  j For example We explicitly write p(x|  j ) as p(x|  j,  j )

: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 7 Problem Statement To use the information provided by the training samples to obtain good estimates for the unknown parameter vectors  1,…  c associated with each category

: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 8 Simplified Problem Statement If samples in D i give no information about  j if i = j We now have c separated problems of the following form: To use a set D of training samples drawn independently from the probability density p(x|  ) to estimate the unknown vector .

: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 9 Suppose that D contains n samples, x 1,…,x n. Then we have The Maximum-Likelihood estimate of  is the value of that maximizes p(D|  ) Likelihood of q with respect to the set of samples

: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 10

: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 11 Let  = (  1, …,  p ) t Let be the gradient operator

: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 12 Log-Likelihood Function We define l (  ) as the log-likelihood function We can write our solution as

: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 13 MLE From We have And Necessary condition for MLE

: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 14 The Gaussian Case: Unknown  Suppose that the samples are drawn from a multivariate normal population with mean  and covariance matrix  Let  is the only unknown Consider a sample point x k and find and

: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 15 The MLE of  must satisfy After rearranging

: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 16 Sample Mean The MLE for the unknown population meanis just the arithmetic average of the training samples (or sample mean) If we think of the n samples as a cloud of points, then the sample mean is the centroid of the cloud

: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 17 The Gaussian Case: Unknown  and  This is a more typical case where mean and covariance matrix are unknown Consider the univariate case with  1 =  and  2 =  2

: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 18 And its derivative is Set to 0 and

: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 19 With a little rearranging, we have

: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 20 MLE for multivariate case

: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 21 Bias The MLE for the variance  2 is biased The expected value over all data sets of size n of the sample variance is not equal to the true variance An Unbiased estimator for  is given by