Chapter 2: Bayesian Decision Theory (Part 2) Minimum-Error-Rate Classification Classifiers, Discriminant Functions and Decision Surfaces The Normal Density.

Slides:

Advertisements

Similar presentations

Component Analysis (Review)

Advertisements

Pattern Recognition and Machine Learning

2 – In previous chapters: – We could design an optimal classifier if we knew the prior probabilities P(wi) and the class- conditional probabilities P(x|wi)

0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Pattern Classification, Chapter 2 (Part 2) 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R.

Pattern Classification. Chapter 2 (Part 1): Bayesian Decision Theory (Sections ) Introduction Bayesian Decision Theory–Continuous Features.

Pattern Classification, Chapter 2 (Part 2) 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R.

Chapter 2: Bayesian Decision Theory (Part 2) Minimum-Error-Rate Classification Classifiers, Discriminant Functions and Decision Surfaces The Normal Density.

Bayesian Decision Theory

Pattern Classification Chapter 2 (Part 2)0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Chapter 1: Introduction to Pattern Recognition

Bayesian Decision Theory Chapter 2 (Duda et al.) – Sections

Parameter Estimation: Maximum Likelihood Estimation Chapter 3 (Duda et al.) – Sections CS479/679 Pattern Recognition Dr. George Bebis.

Chapter 5: Linear Discriminant Functions

Chapter 4 (Part 1): Non-Parametric Classification

0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Chapter 2: Bayesian Decision Theory (Part 1) Introduction Bayesian Decision Theory–Continuous Features All materials used in this course were taken from.

Pattern Classification, Chapter 3 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Chapter 2 (part 3) Bayesian Decision Theory Discriminant Functions for the Normal Density Bayes Decision Theory – Discrete Features All materials used.

Chapter 6: Multilayer Neural Networks

Chapter 3 (part 3): Maximum-Likelihood and Bayesian Parameter Estimation Hidden Markov Model: Extension of Markov Chains All materials used in this course.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Chapter 4 (part 2): Non-Parametric Classification

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Linear Discriminant Functions Chapter 5 (Duda et al.)

Bayesian Estimation (BE) Bayesian Parameter Estimation: Gaussian Case

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Chapter 3 (part 1): Maximum-Likelihood & Bayesian Parameter Estimation  Introduction  Maximum-Likelihood Estimation  Example of a Specific Case  The.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

METU Informatics Institute Min 720 Pattern Classification with Bio-Medical Applications PART 2: Statistical Pattern Classification: Optimal Classification.

0 Pattern Classification, Chapter 3 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda,

Principles of Pattern Recognition

Speech Recognition Pattern Classification. 22 September 2015Veton Këpuska2 Pattern Classification  Introduction  Parametric classifiers  Semi-parametric.

ECE 8443 – Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Normal Distributions Whitening Transformations Linear Discriminants Resources.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 02: BAYESIAN DECISION THEORY Objectives: Bayes.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Whitening.

1 E. Fatemizadeh Statistical Pattern Recognition.

Bayesian Decision Theory (Classification) 主講人：虞台文.

Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.

1 Bayesian Decision Theory Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of Networking and.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Bayesian Decision Theory Basic Concepts Discriminant Functions The Normal Density ROC Curves.

Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 9: Review.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Pattern Classification All materials in these slides* were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Pattern Classification Chapter 2(Part 3) 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Linear Discriminant Functions Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.

Objectives: Loss Functions Risk Min. Error Rate Class. Resources: DHS – Chap. 2 (Part 1) DHS – Chap. 2 (Part 2) RGO - Intro to PR MCE for Speech MCE for.

Lecture 2. Bayesian Decision Theory

CS479/679 Pattern Recognition Dr. George Bebis

Probability theory retro

LECTURE 03: DECISION SURFACES

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Pattern Recognition and Machine Learning

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Presentation transcript:

Chapter 2: Bayesian Decision Theory (Part 2) Minimum-Error-Rate Classification Classifiers, Discriminant Functions and Decision Surfaces The Normal Density All materials used in this course were taken from the textbook “Pattern Classification” by Duda et al., John Wiley & Sons, 2001 with the permission of the authors and the publisher

Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Chapter 2, Section 2. 2 Minimum-Error-Rate Classification Actions are decisions on classes If action  i is taken and the true state of nature is  j then: the decision is correct if i = j and in error if i  j Seek a decision rule that minimizes the probability of error which is the error rate 3

Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Chapter 2, Section 2. 3 Introduction of the zero-one loss function: Therefore, the conditional risk is: “The risk corresponding to this loss function is the average probability error”  3

Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Chapter 2, Section 2. 4 Minimize the risk requires maximize P(  i | x) (since R(  i | x) = 1 – P(  i | x)) For Minimum error rate Decide  i if P (  i | x) > P(  j | x)  j  i 3

Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Chapter 2, Section 2. 5 Regions of decision and zero-one loss function, therefore: If is the zero-one loss function wich means: 3

Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Chapter 2, Section

Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Chapter 2, Section 2. 7 Classifiers, Discriminant Functions and Decision Surfaces The multi-category case Set of discriminant functions g i (x), i = 1,…, c The classifier assigns a feature vector x to class  i if: g i (x) > g j (x)  j  i 4

Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Chapter 2, Section

Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Chapter 2, Section 2. 9 Let g i (x) = - R(  i | x) (max. discriminant corresponds to min. risk!) For the minimum error rate, we take g i (x) = P(  i | x) (max. discrimination corresponds to max. posterior!) g i (x)  P(x |  i ) P(  i ) g i (x) = ln P(x |  i ) + ln P(  i ) (ln: natural logarithm!) 4

Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Chapter 2, Section Feature space divided into c decision regions if g i (x) > g j (x)  j  i then x is in R i ( R i means assign x to  i ) The two-category case A classifier is a “dichotomizer” that has two discriminant functions g 1 and g 2 Let g(x)  g 1 (x) – g 2 (x) Decide  1 if g(x) > 0; Otherwise decide  2 4

Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Chapter 2, Section The computation of g(x) 4

Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Chapter 2, Section

Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Chapter 2, Section The Normal Density Univariate density Density which is analytically tractable Continuous density A lot of processes are asymptotically Gaussian Handwritten characters, speech sounds are ideal or prototype corrupted by random process (central limit theorem) Where:  = mean (or expected value) of x  2 = expected squared deviation or variance 5

Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Chapter 2, Section

Dr. Djamel Bouchaffra CSE 616 Applied Pattern Recognition, Chapter 2, Section Multivariate density Multivariate normal density in d dimensions is: where: x = (x 1, x 2, …, x d ) t (t stands for the transpose vector form)  = (  1,  2, …,  d ) t mean vector  = d*d covariance matrix |  | and  -1 are determinant and inverse respectively 5