Pattern Classification All materials in these slides* were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Slides:



Advertisements
Similar presentations
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
Advertisements

CS479/679 Pattern Recognition Dr. George Bebis
2 – In previous chapters: – We could design an optimal classifier if we knew the prior probabilities P(wi) and the class- conditional probabilities P(x|wi)
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
LECTURE 11: BAYESIAN PARAMETER ESTIMATION
Chapter 2: Bayesian Decision Theory (Part 2) Minimum-Error-Rate Classification Classifiers, Discriminant Functions and Decision Surfaces The Normal Density.
Pattern Classification, Chapter 2 (Part 2) 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R.
Pattern Classification, Chapter 2 (Part 2) 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R.
Pattern Classification Chapter 2 (Part 2)0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Pattern Classification, Chapter 3 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
1 lBayesian Estimation (BE) l Bayesian Parameter Estimation: Gaussian Case l Bayesian Parameter Estimation: General Estimation l Problems of Dimensionality.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Introduction to Bayesian Parameter Estimation
Bayesian Estimation (BE) Bayesian Parameter Estimation: Gaussian Case
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Chapter 3 (part 1): Maximum-Likelihood & Bayesian Parameter Estimation  Introduction  Maximum-Likelihood Estimation  Example of a Specific Case  The.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
0 Pattern Classification, Chapter 3 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda,
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Speech Recognition Pattern Classification. 22 September 2015Veton Këpuska2 Pattern Classification  Introduction  Parametric classifiers  Semi-parametric.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Computational Intelligence: Methods and Applications Lecture 12 Bayesian decisions: foundation of learning Włodzisław Duch Dept. of Informatics, UMK Google:
Chapter 3 (part 2): Maximum-Likelihood and Bayesian Parameter Estimation Bayesian Estimation (BE) Bayesian Estimation (BE) Bayesian Parameter Estimation:
: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha ac.th/~montri.
ECE 8443 – Pattern Recognition LECTURE 08: DIMENSIONALITY, PRINCIPAL COMPONENTS ANALYSIS Objectives: Data Considerations Computational Complexity Overfitting.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 07: BAYESIAN ESTIMATION (Cont.) Objectives:
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
1 Parameter Estimation Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of Networking and Multimedia,
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Univariate Gaussian Case (Cont.)
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Pattern Classification Chapter 2(Part 3) 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O.
Univariate Gaussian Case (Cont.)
CS479/679 Pattern Recognition Dr. George Bebis
Chapter 3: Maximum-Likelihood Parameter Estimation
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
Parameter Estimation 主講人:虞台文.
CH 5: Multivariate Methods
Pattern Classification, Chapter 3
Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Course Outline MODEL INFORMATION COMPLETE INCOMPLETE
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
LECTURE 09: BAYESIAN LEARNING
LECTURE 07: BAYESIAN ESTIMATION
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Chapter 3: Maximum-Likelihood and Bayesian Parameter Estimation (part 2)
Presentation transcript:

Pattern Classification All materials in these slides* were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley & Sons, 2000 with the permission of the authors and the publisher * Some slides are modified 1

Maximum-Likelihood And Bayesian Parameter Estimation (2) 2

Reading Assignment  R. O.Duda,P. E. Hart, and D.G.Stork, Pattern Classification. John Wiley & Sons, 2nd ed.,  Chapter 3, Sections 4,5 and 7 3

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 2) Bayesian Parameter Estimation: Gaussian Case Bayesian Parameter Estimation: General Estimation Problems of Dimensionality Computational Complexity

Pattern Classification, Chapter 1 5  Goal: Estimate p(x) by computing p(x|D)  This formula links the desired class conditional density p(x|D) to the posterior density p(  |D). 3 The Parameter Distribution

Pattern Classification, Chapter 1 6  Bayesian Parameter Estimation: Gaussian Case  The univariate case: (  is the only unknown parameter) (  0 and  0 are known!) Therefore, we need to find: 4

Pattern Classification, Chapter 1 7  Let D = {x 1, x 2, …, x n } 4

Pattern Classification, Chapter 1 8  Reproducing density 4

Pattern Classification, Chapter 1 9 4

10  The univariate case P(x | D)  P(  | D) computed  P(x | D) remains to be computed! Which is equal to Where 4

Pattern Classification, Chapter 1 11 It provides: (Desired class-conditional density P(x | D j,  j )) Therefore: P(x | D j,  j ) together with P(  j ) And using Bayes formula, we obtain the Bayesian classification rule: 4

Pattern Classification, Chapter 1 12 The Multivariate case: 4

Pattern Classification, Chapter 1 13  Bayesian Parameter Estimation: General Theory  P(x | D) computation can be applied to any situation in which the unknown density can be parameterized: the basic assumptions are:  The form of P(x |  ) is assumed known, but the value of  is not known exactly  Our knowledge about  is assumed to be contained in a known prior density P(  )  The rest of our knowledge  is contained in a set D of n random variables x 1, x 2, …, x n that follows P(x) 5

Pattern Classification, Chapter 1 14 The basic problem is: “Compute the posterior density P(  | D)” then “Derive P(x | D)” Using Bayes formula, we have: And by independence assumption: 5

Pattern Classification, Chapter 1 15  Bayesian Parameter Estimation: Incremental Learning  To indicate explicitly the number of samples in a set for a single category, we shall write D n = {x 1,..., x n }. Then, if n > 1  Using we get 5

Pattern Classification, Chapter 1 16  Problems of Dimensionality  Problems involving 50 or 100 features (binary valued)  Classification accuracy depends upon the dimensionality and the amount of training data  Case of two classes multivariate normal with the same covariance 7

Pattern Classification, Chapter 1 17  If features are independent then:  Most useful features are the ones for which the difference between the means is large relative to the standard deviation  It has frequently been observed in practice that, beyond a certain point, the inclusion of additional features leads to worse rather than better performance: we have the wrong model ! 7

Pattern Classification, Chapter

Pattern Classification, Chapter 1 19  Computational Complexity  Our design methodology is affected by the computational difficulty  “big oh” notation f(x) = O(h(x)) “big oh of h(x)” If: (An upper bound on f(x) grows no worse than h(x) for sufficiently large x!) f(x) = 2+3x+4x 2 g(x) = x 2 f(x) = O(x 2 ) 7

Pattern Classification, Chapter 1 20 “big oh” is not unique! f(x) = O(x 2 ); f(x) = O(x 3 ); f(x) = O(x 4 )  “big theta” notation f(x) =  (h(x)) If: f(x) =  (x 2 ) but f(x)   (x 3 ) 7

Pattern Classification, Chapter 1 21  Complexity of the ML Estimation  Gaussian priors in d dimensions classifier with n training samples for each of c classes  For each category, we have to compute the discriminant function Total = O(d 2.. n) Total for c classes = O(cd 2.n)  O(d 2.n)  Cost increase when d and n are large! 7

Pattern Classification, Chapter 1 22  Overfitting  Frequently, no of available samples usually inadequate, how to proceed?  Solutions:  Reduce dimensionality: redesign feature extractor, or use a subset of features  Assume all c classes share same covariance matrix: pool available data  Assume statistical independence: all off-diagonal elements of covariance matrix are zero 7

Pattern Classification, Chapter 1 23  Overfitting  Classifier performs better than if we overfit the data, why? The “training data” (black dots) were selected from a quadratic function plus Gaussian noise. The 10 th degree polynomial shown fits the data perfectly, but we desire instead the second-order function f(x), since it would lead to better predictions for new samples. 7