CS479/679 Pattern Recognition Dr. George Bebis

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Point Estimation Notes of STAT 6205 by Dr. Fan.
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
CS479/679 Pattern Recognition Dr. George Bebis
Face Recognition Ying Wu Electrical and Computer Engineering Northwestern University, Evanston, IL
2 – In previous chapters: – We could design an optimal classifier if we knew the prior probabilities P(wi) and the class- conditional probabilities P(x|wi)
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
LECTURE 11: BAYESIAN PARAMETER ESTIMATION
Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.
Visual Recognition Tutorial
Parameter Estimation: Maximum Likelihood Estimation Chapter 3 (Duda et al.) – Sections CS479/679 Pattern Recognition Dr. George Bebis.
Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
PatReco: Estimation/Training Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
AGC DSP AGC DSP Professor A G Constantinides© Estimation Theory We seek to determine from a set of data, a set of parameters such that their values would.
Pattern Classification, Chapter 3 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P.
1 lBayesian Estimation (BE) l Bayesian Parameter Estimation: Gaussian Case l Bayesian Parameter Estimation: General Estimation l Problems of Dimensionality.
Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {
Introduction to Bayesian Parameter Estimation
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.
Bayesian Estimation (BE) Bayesian Parameter Estimation: Gaussian Case
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Chapter 3 (part 1): Maximum-Likelihood & Bayesian Parameter Estimation  Introduction  Maximum-Likelihood Estimation  Example of a Specific Case  The.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Speech Recognition Pattern Classification. 22 September 2015Veton Këpuska2 Pattern Classification  Introduction  Parametric classifiers  Semi-parametric.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Chapter 3 (part 2): Maximum-Likelihood and Bayesian Parameter Estimation Bayesian Estimation (BE) Bayesian Estimation (BE) Bayesian Parameter Estimation:
: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha ac.th/~montri.
Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 07: BAYESIAN ESTIMATION (Cont.) Objectives:
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
1 Parameter Estimation Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of Networking and Multimedia,
Machine Learning 5. Parametric Methods.
Univariate Gaussian Case (Cont.)
Intro. ANN & Fuzzy Systems Lecture 15. Pattern Classification (I): Statistical Formulation.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Pattern Classification All materials in these slides* were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Computational Intelligence: Methods and Applications Lecture 26 Density estimation, Expectation Maximization. Włodzisław Duch Dept. of Informatics, UMK.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Applied statistics Usman Roshan.
Univariate Gaussian Case (Cont.)
Chapter 3: Maximum-Likelihood Parameter Estimation
LECTURE 06: MAXIMUM LIKELIHOOD ESTIMATION
Probability Theory and Parameter Estimation I
Probability theory retro
Ch3: Model Building through Regression
Parameter Estimation 主講人:虞台文.
Maximum Likelihood Estimation
Pattern Classification, Chapter 3
Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)
Outline Parameter estimation – continued Non-parametric methods.
Course Outline MODEL INFORMATION COMPLETE INCOMPLETE
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models
LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.
LECTURE 09: BAYESIAN LEARNING
LECTURE 07: BAYESIAN ESTIMATION
Parametric Methods Berlin Chen, 2005 References:
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Learning From Observed Data
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)
Presentation transcript:

CS479/679 Pattern Recognition Dr. George Bebis Parameter Estimation: Maximum Likelihood Estimation Chapter 3 (Duda et al.) – Sections 3.1-3.2 CS479/679 Pattern Recognition Dr. George Bebis

Parameter Estimation Bayesian Decision Theory allows us to design an optimal classifier given that we have estimated P(i) and p(x/i) first: Estimating P(i) is usually not very difficult. Estimating p(x/i) could be more difficult: Dimensionality of feature space is large. Number of samples is often too small.

Parameter Estimation (cont’d) We will make the following assumptions: A set of training samples D ={x1, x2, ...., xn} is given, where the samples were drawn according to p(x|wj). p(x|wj) has some known parametric form: Parameter estimation problem: e.g., p(x /i) ~ N(μ i , i) also denoted as p(x/q) where q=(μi , Σi) Given D, find the best possible q

Main Methods in Parameter Estimation Maximum Likelihood (ML) Bayesian Estimation (BE)

Main Methods in Parameter Estimation Maximum Likelihood (ML) Best estimate is obtained by maximizing the probability of obtaining the samples D ={x1, x2, ...., xn} actually observed: ML assumes that θ is fixed and makes a point estimate:

Main Methods in Parameter Estimation (cont’d) Bayesian Estimation (BE) Assumes that θ is a set of random variables that have some known a-priori distribution p(θ). Estimates a distribution rather than making a point estimate (i.e., like ML): Note: the BE solution p(x/D) might not be of the parametric form assumed (e.g., p(x/q)).

ML Estimation - Assumptions Consider c classes and c training data sets (i.e., one for each class): Samples in Dj are drawn independently according to p(x/ωj). Problem: given D1, D2, ...,Dc and a model for p(x/ωj) ~ p(x/q), estimate: D1, D2, ...,Dc q1, q2,…, qc

ML Estimation - Problem Formulation If the samples in Dj provide no information about qi ( ), we need to solve c independent problems (i.e., one for each class). The ML estimate for D={x1,x2,..,xn} is the value that maximizes p(D / q) (i.e., best supports the training data). Using independence assumption, we can simplify p(D / q) :

ML Estimation - Solution How can we find the maximum of p(D/ q) ? where (gradient)

ML Estimation Using Log-Likelihood Taking the log for simplicity: Maximizes ln p(D/ θ): log-likelihood

Example p(D / θ) =μ ln p(D/ θ) training data: unknown mean, known variance p(D / θ) =μ ln p(D/ θ)

ML for Multivariate Gaussian Density: Case of Unknown θ=μ Assume Computing the gradient, we have:

ML for Multivariate Gaussian Density: Case of Unknown θ=μ (cont’d) Setting we have: The solution is given by The ML estimate is simply the “sample mean”.

Special Case: Maximum A-Posteriori Estimator (MAP) Assume that θ is a random variable with known p(θ). Maximize p(θ/D) or p(D/θ)p(θ) or ln p(D/ θ)p(θ): Consider:

Special Case: Maximum A-Posteriori Estimator (MAP) (cont’d) What happens when p(θ) is uniform? MAP is equivalent to ML

MAP for Multivariate Gaussian Density: Case of Unknown θ=μ Assume Maximize ln p(μ /D) = ln p(D/ μ)p(μ): and (both are known)

MAP for Multivariate Gaussian Density: Case of Unknown θ=μ (cont’d) If , then What happens when

ML for Univariate Gaussian Density: Case of Unknown θ=(μ,σ2) Assume θ =(θ1,θ2)=(μ,σ2) p(xk/θ) p(xk/θ) p(xk/θ)

ML for Univariate Gaussian Density: Case of Unknown θ=(μ,σ2) (cont’d) p(xk/ θ)=0 =0 =0 The solutions are given by: sample mean sample variance

ML for Multivariate Gaussian Density: Case of Unknown θ=(μ,Σ) In the general case (i.e., multivariate Gaussian) the solutions are: sample mean sample covariance

Biased and Unbiased Estimates An estimate is unbiased when The ML estimate is unbiased, i.e., The ML estimates and are biased:

Biased and Unbiased Estimates (cont’d) The following are unbiased estimates of and

Comments ML estimation is simpler than alternative methods. ML provides more accurate estimates as the number of training samples increases. If the model for p(x/ θ) is correct, and the independence assumptions among samples are true, ML will work well.