Pattern Classification Chapter 2 (Part 2)0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O.

Slides:

Advertisements

Similar presentations

Component Analysis (Review)

Advertisements

Pattern Recognition and Machine Learning

2 – In previous chapters: – We could design an optimal classifier if we knew the prior probabilities P(wi) and the class- conditional probabilities P(x|wi)

0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Chapter 2: Bayesian Decision Theory (Part 2) Minimum-Error-Rate Classification Classifiers, Discriminant Functions and Decision Surfaces The Normal Density.

Pattern Classification, Chapter 2 (Part 2) 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R.

Pattern Classification. Chapter 2 (Part 1): Bayesian Decision Theory (Sections ) Introduction Bayesian Decision Theory–Continuous Features.

Pattern Classification, Chapter 2 (Part 2) 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R.

Chapter 2: Bayesian Decision Theory (Part 2) Minimum-Error-Rate Classification Classifiers, Discriminant Functions and Decision Surfaces The Normal Density.

Bayesian Decision Theory

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Bayesian Decision Theory Chapter 2 (Duda et al.) – Sections

Chapter 5: Linear Discriminant Functions

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Chapter 2: Bayesian Decision Theory (Part 1) Introduction Bayesian Decision Theory–Continuous Features All materials used in this course were taken from.

Pattern Classification, Chapter 3 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Chapter 2 (part 3) Bayesian Decision Theory Discriminant Functions for the Normal Density Bayes Decision Theory – Discrete Features All materials used.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Bayesian Estimation (BE) Bayesian Parameter Estimation: Gaussian Case

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Chapter 3 (part 1): Maximum-Likelihood & Bayesian Parameter Estimation  Introduction  Maximum-Likelihood Estimation  Example of a Specific Case  The.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

METU Informatics Institute Min 720 Pattern Classification with Bio-Medical Applications PART 2: Statistical Pattern Classification: Optimal Classification.

0 Pattern Classification, Chapter 3 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda,

Principles of Pattern Recognition

Speech Recognition Pattern Classification. 22 September 2015Veton Këpuska2 Pattern Classification  Introduction  Parametric classifiers  Semi-parametric.

ECE 8443 – Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Normal Distributions Whitening Transformations Linear Discriminants Resources.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 02: BAYESIAN DECISION THEORY Objectives: Bayes.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Whitening.

1 E. Fatemizadeh Statistical Pattern Recognition.

Bayesian Decision Theory (Classification) 主講人：虞台文.

ECE 8443 – Pattern Recognition LECTURE 08: DIMENSIONALITY, PRINCIPAL COMPONENTS ANALYSIS Objectives: Data Considerations Computational Complexity Overfitting.

Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.

1 Bayesian Decision Theory Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of Networking and.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Discriminant Analysis

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Objectives: Normal Random Variables Support Regions Whitening Transformations Resources: DHS – Chap. 2 (Part 2) K.F. – Intro to PR X. Z. – PR Course S.B.

Pattern Classification All materials in these slides* were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Pattern Classification Chapter 2(Part 3) 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Lecture 2. Bayesian Decision Theory

LECTURE 04: DECISION SURFACES

LECTURE 10: DISCRIMINANT ANALYSIS

Probability theory retro

LECTURE 03: DECISION SURFACES

Pattern Classification, Chapter 3

Pattern Recognition PhD Course.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Pattern Recognition and Machine Learning

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

LECTURE 09: DISCRIMINANT ANALYSIS

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Presentation transcript:

Pattern Classification Chapter 2 (Part 2)0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley & Sons, 2000 with the permission of the authors and the publisher

Pattern Classification Chapter 2 (Part 2) 1 Chapter 2 (Part 2): Bayesian Decision Theory (Sections ) Minimum-Error-Rate Classification Minimum-Error-Rate Classification Classifiers, Discriminant Functions and Decision Surfaces Classifiers, Discriminant Functions and Decision Surfaces The Normal Density The Normal Density

Pattern Classification Chapter 2 (Part 2)2 2.3 Minimum-Error-Rate Classification Actions are decisions on classes Actions are decisions on classes If action  i is taken and the true state of nature is  j then: the decision is correct if i = j and in error if i  j Seek a decision rule that minimizes the probability of error which is the error rate Seek a decision rule that minimizes the probability of error which is the error rate

Pattern Classification Chapter 2 (Part 2)3 Introduction of the zero-one loss function: Introduction of the zero-one loss function: Therefore, the conditional risk is: “The risk corresponding to this loss function is the average probability error” 

Pattern Classification Chapter 2 (Part 2)4 Minimize the risk requires maximize P(  i | x) Minimize the risk requires maximize P(  i | x) (since R(  i | x) = 1 – P(  i | x)) For Minimum error rate For Minimum error rate Decide  i if P (  i | x) > P(  j | x)  j  i Decide  i if P (  i | x) > P(  j | x)  j  i

Pattern Classification Chapter 2 (Part 2)5 Regions of decision and zero-one loss function, therefore: Regions of decision and zero-one loss function, therefore: If is the zero-one loss function which means: If is the zero-one loss function which means:

Pattern Classification Chapter 2 (Part 2)6

7 2.4 Classifiers, Discriminant Functions and Decision Surfaces The multi-category case The multi-category case Set of discriminant functions g i (x), i = 1,…, c Set of discriminant functions g i (x), i = 1,…, c The classifier assigns a feature vector x to class  i The classifier assigns a feature vector x to class  iif: g i (x) > g j (x)  j  i

Pattern Classification Chapter 2 (Part 2)8

9 For the minimum risk case For the minimum risk case Let g i (x) = - R(  i | x) (max. discriminant corresponds to min. risk!) For the minimum error rate cae, we take For the minimum error rate cae, we take g i (x) = P(  i | x) (max. discrimination corresponds to max. posterior!) g i (x)  p(x |  i ) P(  i ) g i (x) = ln p(x |  i ) + ln P(  i ) (ln: natural logarithm!)

Pattern Classification Chapter 2 (Part 2)10 Feature space divided into c decision regions Feature space divided into c decision regions if g i (x) > g j (x)  j  i then x is in R i ( R i means assign x to  i ) The two-category case The two-category case A classifier is a “dichotomizer” that has two discriminant functions g 1 and g 2 A classifier is a “dichotomizer” that has two discriminant functions g 1 and g 2 Let g(x)  g 1 (x) – g 2 (x) Decide  1 if g(x) > 0 ; Otherwise decide  2

Pattern Classification Chapter 2 (Part 2)11 The computation of g(x) The computation of g(x)

Pattern Classification Chapter 2 (Part 2)12

Pattern Classification Chapter 2 (Part 2) The Normal Density Univariate density Univariate density Continuous density Continuous density A lot of processes are asymptotically Gaussian A lot of processes are asymptotically Gaussian Handwritten characters, speech sounds are ideal or prototype corrupted by random process (central limit theorem) Handwritten characters, speech sounds are ideal or prototype corrupted by random process (central limit theorem)Where:  = mean (or expected value) of x  = mean (or expected value) of x  2 = expected squared deviation or variance  2 = expected squared deviation or variance

Pattern Classification Chapter 2 (Part 2)14

Pattern Classification Chapter 2 (Part 2)15 Multivariate density Multivariate density Multivariate normal density in d dimensions is: Multivariate normal density in d dimensions is:where: x = (x 1, x 2, …, x d ) t (t stands for the transpose vector form) x = (x 1, x 2, …, x d ) t (t stands for the transpose vector form)  = (  1,  2, …,  d ) t mean vector  = (  1,  2, …,  d ) t mean vector  = d*d covariance matrix  = d*d covariance matrix |  | and  -1 are determinant and inverse respectively |  | and  -1 are determinant and inverse respectively

Pattern Classification Chapter 2 (Part 2)16

Pattern Classification Chapter 2 (Part 2)17 Linear combinations of normally distributed random variables are normally distributed Linear combinations of normally distributed random variables are normally distributed if if then then Whitening transform: Whitening transform: :The orthonormal eigenvectors of :The orthonormal eigenvectors of :The diagonal matrix of the eigenvalues :The diagonal matrix of the eigenvalues

Pattern Classification Chapter 2 (Part 2)18 Mahalanobis distance from x to Mahalanobis distance from x to

Pattern Classification Chapter 2 (Part 2)19 Entropy Entropy Measure the fundamental uncertainty in the values of points selected randomly Measure the fundamental uncertainty in the values of points selected randomly Measured in nats, bit, Measured in nats, bit, Normal distribution has max entropy of all distributions having a given mean and variance Normal distribution has max entropy of all distributions having a given mean and variance

Pattern Classification Chapter 2 (Part 2)20 Assignment :2.3.12, ,2.5.22,2.5.23