LECTURE 11: BAYESIAN PARAMETER ESTIMATION

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
CS479/679 Pattern Recognition Dr. George Bebis
Giansalvo EXIN Cirrincione unit #3 PROBABILITY DENSITY ESTIMATION labelled unlabelled A specific functional form for the density model is assumed. This.
Face Recognition Ying Wu Electrical and Computer Engineering Northwestern University, Evanston, IL
2 – In previous chapters: – We could design an optimal classifier if we knew the prior probabilities P(wi) and the class- conditional probabilities P(x|wi)
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Visual Recognition Tutorial
Parameter Estimation: Maximum Likelihood Estimation Chapter 3 (Duda et al.) – Sections CS479/679 Pattern Recognition Dr. George Bebis.
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification, Chapter 3 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P.
1 lBayesian Estimation (BE) l Bayesian Parameter Estimation: Gaussian Case l Bayesian Parameter Estimation: General Estimation l Problems of Dimensionality.
Visual Recognition Tutorial
Introduction to Bayesian Parameter Estimation
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.
Bayesian Estimation (BE) Bayesian Parameter Estimation: Gaussian Case
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Chapter 3 (part 1): Maximum-Likelihood & Bayesian Parameter Estimation  Introduction  Maximum-Likelihood Estimation  Example of a Specific Case  The.
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
Chapter Two Probability Distributions: Discrete Variables
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
PATTERN RECOGNITION AND MACHINE LEARNING
0 Pattern Classification, Chapter 3 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda,
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Speech Recognition Pattern Classification. 22 September 2015Veton Këpuska2 Pattern Classification  Introduction  Parametric classifiers  Semi-parametric.
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 02: BAYESIAN DECISION THEORY Objectives: Bayes.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Conjugate Priors Multinomial Gaussian MAP Variance Estimation Example.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.
Optimal Bayes Classification
Chapter 3 (part 2): Maximum-Likelihood and Bayesian Parameter Estimation Bayesian Estimation (BE) Bayesian Estimation (BE) Bayesian Parameter Estimation:
: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha ac.th/~montri.
Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 07: BAYESIAN ESTIMATION (Cont.) Objectives:
1 Unsupervised Learning and Clustering Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
1 Parameter Estimation Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of Networking and Multimedia,
Machine Learning 5. Parametric Methods.
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
Univariate Gaussian Case (Cont.)
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Pattern Classification All materials in these slides* were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Mixture Densities Maximum Likelihood Estimates.
Univariate Gaussian Case (Cont.)
CS479/679 Pattern Recognition Dr. George Bebis
Chapter 3: Maximum-Likelihood Parameter Estimation
LECTURE 06: MAXIMUM LIKELIHOOD ESTIMATION
Probability theory retro
Ch3: Model Building through Regression
Parameter Estimation 主講人:虞台文.
Pattern Classification, Chapter 3
Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)
Outline Parameter estimation – continued Non-parametric methods.
Course Outline MODEL INFORMATION COMPLETE INCOMPLETE
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Parametric Estimation
LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.
LECTURE 09: BAYESIAN LEARNING
LECTURE 07: BAYESIAN ESTIMATION
Parametric Methods Berlin Chen, 2005 References:
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Learning From Observed Data
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)
ECE – Pattern Recognition Lecture 4 – Parametric Estimation
Presentation transcript:

LECTURE 11: BAYESIAN PARAMETER ESTIMATION ECE 8443 – Pattern Recognition LECTURE 11: BAYESIAN PARAMETER ESTIMATION • Objectives: General Approach Gaussian Univariate Gaussian Multivariate Resources: DHS – Chap. 3 (Part 2) JOS – Tutorial Nebula – Links BGSU – Example • URL: .../publications/courses/ece_8443/lectures/current/lecture_11.ppt

11: BAYESIAN PARAMETER ESTIMATION INTRODUCTION In Chapter 2, we learned how to design an optimal classifier if we knew the prior probabilities, P(i), and class-conditional densities, p(x|i). Bayes: treat the parameters as random variables having some known prior distribution. Observations of samples converts this to a posterior. Bayesian learning: sharpen the a posteriori density causing it to peak near the true value. Supervised vs. unsupervised: do we know the class assignments of the training data. Bayesian estimation and ML estimation produce very similar results in many cases. Reduces statistical inference (prior knowledge or beliefs about the world) to probabilities.

11: BAYESIAN PARAMETER ESTIMATION CLASS-CONDITIONAL DENSITIES Posterior probabilities, P(i|x), central to classification. If priors and class-conditional densities are unknown? For a training set, D, Bayes formula becomes: We assume priors are known: P(i|D) = P(i). Also, assume functional independence:

11: BAYESIAN PARAMETER ESTIMATION THE PARAMETER DISTRIBUTION Assume the parametric form of the evidence, p(x), is known: p(x|). Any information we have about about  prior to collecting samples is contained in p(D|). Observation of samples converts this to a posterior, p(|D), which we hope is peaked around the true value of . Our goal is to estimate a paramter vector: We can write the joint distribution as a product: because the samples are drawn independently.

11: BAYESIAN PARAMETER ESTIMATION UNIVARIATE GAUSSIAN CASE Case: mean unknown Known prior density: Using Bayes formula: Rationale: Once a value of  is known, the density for x is completely known.  is a normalization factor that depends on the data, D.

11: BAYESIAN PARAMETER ESTIMATION UNIVARIATE GAUSSIAN CASE Applying our Gaussian assumptions:

11: BAYESIAN PARAMETER ESTIMATION UNIVARIATE GAUSSIAN CASE p(|D) is an exponential of a quadratic function, which makes it a normal distribution. Because this is true for any n, it is referred to as a reproducing density. p() is referred to as a conjugate prior. Write p(|D) ~ N(n,n2) and equate coefficients: Two equations and two unknowns. Solve for n and n2:

11: BAYESIAN PARAMETER ESTIMATION n represents our best guess after n samples. n2 represents our uncertainty about this guess. n2 approaches 2/n for large n – each additional observation decreases our uncertainty. The posterior, p(|D), becomes more sharply peaked as n grows large. This is known as Bayesian learning.

11: BAYESIAN PARAMETER ESTIMATION CLASS-CONDITIONAL DENSITY How do we obtain p(x|D):

11: BAYESIAN PARAMETER ESTIMATION CLASS-CONDITIONAL DENSITY

11: BAYESIAN PARAMETER ESTIMATION MULTIVARIATE CASE

11: BAYESIAN PARAMETER ESTIMATION MULTIVARIATE CASE

11: BAYESIAN PARAMETER ESTIMATION MULTIVARIATE CASE