Parameter Estimation 主講人:虞台文.

Slides:



Advertisements
Similar presentations
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
Advertisements

CS479/679 Pattern Recognition Dr. George Bebis
2 – In previous chapters: – We could design an optimal classifier if we knew the prior probabilities P(wi) and the class- conditional probabilities P(x|wi)
LECTURE 11: BAYESIAN PARAMETER ESTIMATION
Visual Recognition Tutorial
Parameter Estimation: Maximum Likelihood Estimation Chapter 3 (Duda et al.) – Sections CS479/679 Pattern Recognition Dr. George Bebis.
Maximum likelihood (ML) and likelihood ratio (LR) test
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Statistical Inference Chapter 12/13. COMP 5340/6340 Statistical Inference2 Statistical Inference Given a sample of observations from a population, the.
Part 2b Parameter Estimation CSE717, FALL 2008 CUBS, Univ at Buffalo.
Pattern Classification, Chapter 3 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P.
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {
Introduction to Bayesian Parameter Estimation
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Chapter 3 (part 1): Maximum-Likelihood & Bayesian Parameter Estimation  Introduction  Maximum-Likelihood Estimation  Example of a Specific Case  The.
Bayesian Learning Part 3+/- σ. Administrivia Final project/proposal Hand-out/brief discussion today Proposal due: Mar 27 Midterm exam: Thurs, Mar 22 (Thurs.
Maximum likelihood (ML)
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha ac.th/~montri.
Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 07: BAYESIAN ESTIMATION (Cont.) Objectives:
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Point Estimation of Parameters and Sampling Distributions Outlines:  Sampling Distributions and the central limit theorem  Point estimation  Methods.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
1 Parameter Estimation Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of Networking and Multimedia,
Machine Learning 5. Parametric Methods.
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
Univariate Gaussian Case (Cont.)
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Pattern Classification All materials in these slides* were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Conditional Expectation
Presentation : “ Maximum Likelihood Estimation” Presented By : Jesu Kiran Spurgen Date :
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Crash course in probability theory and statistics – part 2 Machine Learning, Wed Apr 16, 2008.
Applied statistics Usman Roshan.
Univariate Gaussian Case (Cont.)
CS479/679 Pattern Recognition Dr. George Bebis
Chapter 3: Maximum-Likelihood Parameter Estimation
STATISTICS POINT ESTIMATION
12. Principles of Parameter Estimation
LECTURE 06: MAXIMUM LIKELIHOOD ESTIMATION
Probability Theory and Parameter Estimation I
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
7-1 Introduction The field of statistical inference consists of those methods used to make decisions or to draw conclusions about a population. These.
Ch3: Model Building through Regression
Maximum Likelihood Estimation
Pattern Classification, Chapter 3
Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)
Outline Parameter estimation – continued Non-parametric methods.
Course Outline MODEL INFORMATION COMPLETE INCOMPLETE
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
LECTURE 09: BAYESIAN LEARNING
LECTURE 07: BAYESIAN ESTIMATION
Parametric Methods Berlin Chen, 2005 References:
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Learning From Observed Data
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
12. Principles of Parameter Estimation
EM Algorithm 主講人:虞台文.
Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)
Applied Statistics and Probability for Engineers
Maximum Likelihood Estimation (MLE)
Presentation transcript:

Parameter Estimation 主講人:虞台文

Contents Introduction Maximum-Likelihood Estimation Bayesian Estimation

Parameter Estimation Introduction

Bayesian Rule We want to estimate the parameters of class-conditional densities if its parametric form is known, e.g.,

Methods The Method of Moments Maximum-Likelihood Estimation Not discussed in this course Maximum-Likelihood Estimation Assume parameters are fixed but unknown Bayesian Estimation Assume parameters are random variables Sufficient Statistics

Maximum-Likelihood Estimation Parameter Estimation Maximum-Likelihood Estimation

Samples 1 2 D1 D2 The samples in Dj are drawn independently according to the probability law p(x|j). D3 Assume that p(x|j) has a known parametric form with parameter vector j. 3 e.g., j

Goal D1 D2 D3 1 2 Use Dj to estimate the unknown parameter vector j The estimated version will be denoted by Goal 1 2 D1 D2 D3 Use Dj to estimate the unknown parameter vector j 3

Problem Formulation D Now the problem is: Because each class is consider individually, the subscript used before will be dropped. Now the problem is: D Given a sample set D, whose elements are drawn independently from a population possessing a known parameter form, say p(x|), we want to choose a that will make D to occur most likely.

Criterion of ML MLE By the independence assumption, we have Likelihood function: MLE

Criterion of ML Often, we resort to maximize the log-likelihood function How? MLE

Criterion of ML Example: How?

Differential Approach if Possible Find the extreme values using the method in differential calculus. Let f() be a continuous function, where =(1, 2,…, n)T. Gradient Operator Find the extreme values by solving

Preliminary Let

Preliminary Let (xf )T

The Gaussian Population Two cases: Unknown  Unknown  and 

The Gaussian Population: Unknown 

The Gaussian Population: Unknown  Set Sample Mean

The Gaussian Population: Unknown  and  Consider univariate normal case

The Gaussian Population: Unknown  and  Consider univariate normal case unbiased Set biased

The Gaussian Population: Unknown  and  For multivariate normal case The MLE of  and  are: unbiased biased

Unbiasedness Unbiased Estimator Consistent Estimator (Absolutely unbiase) Consistent Estimator (asymptotically unbiased)

MLE for Normal Population Sample Mean Sample Covariance Matrix

Parameter Estimation Bayesian Estimation

Comparison MLE (Maximum-Likelihood Estimation) Bayesian Estimation to find the fixed but unknown parameters of a population. Bayesian Estimation Consider the parameters of a population to be random variables.

Heart of Bayesian Classification Ultimate Goal: Evaluate What can we do if prior probabilities and class-conditional densities are unknown?

Helpful Knowledge Functional form for unknown densities e.g., Normal, exponential, … Ranges for the values of unknown parameters e.g., uniform distributed over a range Training Samples Sampling according to the states of nature.

Posterior Probabilities from Sample

Posterior Probabilities from Sample Each class can be considered independently

Problem Formulation D This the central problem of Bayesian Learning. Let D be a set of samples drawn independently according to the fixed but known distribution p(x). We want to determine D This the central problem of Bayesian Learning.

Parameter Distribution Assume p(x) is unknown but knowing it has a fixed form with parameter vector . is complete known Assume  is a random vector, and p() is a known a priori.

Class-Conditional Density Estimation

Class-Conditional Density Estimation The posterior density we want to estimate The form of distribution is assumed known

Class-Conditional Density Estimation If p(|D) has a sharp peak at

Class-Conditional Density Estimation

The Univariate Gaussian: Unknown  distribution form is known assume  is normal distributed

The Univariate Gaussian: Unknown 

The Univariate Gaussian: Unknown  Comparison

The Univariate Gaussian: Unknown 

The Univariate Gaussian: Unknown 

The Univariate Gaussian: p(x|D)

The Univariate Gaussian: p(x|D)

The Univariate Gaussian: p(x|D) =?

The Multivariate Gaussian: Unknown  distribution form is known assume  is normal distributed

The Multivariate Gaussian: Unknown 

The Multivariate Gaussian: Unknown 

General Theory 1. the form of class-conditional density is known. 2. knowledge about the parameter distribution is available. samples are randomly drawn according to the unknown probability density p(x). 3.

General Theory 1. the form of class-conditional density is known. 2. knowledge about the parameter distribution is available. samples are randomly drawn according to the unknown probability density p(x). 3.

Incremental Learning Recursive

1. Example 2. 3.

1. Example 2. 3. 4 2 4 6 8 10  p(|Dn) 3 2 1