Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Slides:

Advertisements

Similar presentations

Classification. Introduction A discriminant is a function that separates the examples of different classes. For example – IF (income > Q1 and saving >Q2)

Advertisements

0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Chapter 2: Bayesian Decision Theory (Part 2) Minimum-Error-Rate Classification Classifiers, Discriminant Functions and Decision Surfaces The Normal Density.

Pattern Classification, Chapter 2 (Part 2) 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R.

Pattern Classification. Chapter 2 (Part 1): Bayesian Decision Theory (Sections ) Introduction Bayesian Decision Theory–Continuous Features.

Pattern Classification, Chapter 2 (Part 2) 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R.

Chapter 2: Bayesian Decision Theory (Part 2) Minimum-Error-Rate Classification Classifiers, Discriminant Functions and Decision Surfaces The Normal Density.

Bayesian Decision Theory

Pattern Classification Chapter 2 (Part 2)0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Bayesian Decision Theory Chapter 2 (Duda et al.) – Sections

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Chapter 2: Bayesian Decision Theory (Part 1) Introduction Bayesian Decision Theory–Continuous Features All materials used in this course were taken from.

Pattern Classification, Chapter 3 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Hidden Markov Model: Extension of Markov Chains

Chapter 2 (part 3) Bayesian Decision Theory Discriminant Functions for the Normal Density Bayes Decision Theory – Discrete Features All materials used.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Chapter 3 (part 1): Maximum-Likelihood & Bayesian Parameter Estimation  Introduction  Maximum-Likelihood Estimation  Example of a Specific Case  The.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

0 Pattern Classification, Chapter 3 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda,

Pattern Recognition: Baysian Decision Theory Charles Tappert Seidenberg School of CSIS, Pace University.

Principles of Pattern Recognition

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 02: BAYESIAN DECISION THEORY Objectives: Bayes.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Covariance matrices for all of the classes are identical, But covariance matrices are arbitrary.

Bayesian Decision Theory Basic Concepts Discriminant Functions The Normal Density ROC Curves.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Basic Technical Concepts in Machine Learning Introduction Supervised learning Problems in supervised learning Bayesian decision theory.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Pattern Classification All materials in these slides* were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Pattern Classification Chapter 2(Part 3) 0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O.

Objectives: Loss Functions Risk Min. Error Rate Class. Resources: DHS – Chap. 2 (Part 1) DHS – Chap. 2 (Part 2) RGO - Intro to PR MCE for Speech MCE for.

Basic Technical Concepts in Machine Learning

Lecture 1.31 Criteria for optimal reception of radio signals.

Introduction Machine Learning 14/02/2017.

Special Topics In Scientific Computing

LECTURE 03: DECISION SURFACES

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Pattern Classification, Chapter 3

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

INTRODUCTION TO Machine Learning 3rd Edition

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Presentation transcript:

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley & Sons, 2000 with the permission of the authors and the publisher

Chapter 2 (Part 1): Bayesian Decision Theory (Sections 2.1-2.2) Introduction Bayesian Decision Theory–Continuous Features

Introduction The sea bass/salmon example State of nature, prior State of nature is a random variable The catch of salmon and sea bass is equiprobable P(1) = P(2) (uniform priors) P(1) + P( 2) = 1 (exclusivity and exhaustivity) Pattern Classification, Chapter 2 (Part 1)

Decision rule with only the prior information Decide 1 if P(1) > P(2) otherwise decide 2 Use of the class –conditional information P(x | 1) and P(x | 2) describe the difference in lightness between populations of sea and salmon Pattern Classification, Chapter 2 (Part 1)

Pattern Classification, Chapter 2 (Part 1)

Posterior, likelihood, evidence P(j | x) = P(x | j) . P (j) / P(x) Where in case of two categories Posterior = (Likelihood. Prior) / Evidence Pattern Classification, Chapter 2 (Part 1)

Pattern Classification, Chapter 2 (Part 1)

Decision given the posterior probabilities X is an observation for which: if P(1 | x) > P(2 | x) True state of nature = 1 if P(1 | x) < P(2 | x) True state of nature = 2 Therefore: whenever we observe a particular x, the probability of error is : P(error | x) = P(1 | x) if we decide 2 P(error | x) = P(2 | x) if we decide 1 Pattern Classification, Chapter 2 (Part 1)

Minimizing the probability of error Decide 1 if P(1 | x) > P(2 | x); otherwise decide 2 Therefore: P(error | x) = min [P(1 | x), P(2 | x)] (Bayes decision) Pattern Classification, Chapter 2 (Part 1)

Bayesian Decision Theory – Continuous Features Generalization of the preceding ideas Use of more than one feature Use more than two states of nature Allowing actions and not only decide on the state of nature Introduce a loss of function which is more general than the probability of error Pattern Classification, Chapter 2 (Part 1)

Refusing to make a decision in close or bad cases! Allowing actions other than classification primarily allows the possibility of rejection Refusing to make a decision in close or bad cases! The loss function states how costly each action taken is Pattern Classification, Chapter 2 (Part 1)

Let {1, 2,…, c} be the set of c states of nature (or “categories”) Let {1, 2,…, a} be the set of possible actions Let (i | j) be the loss incurred for taking action i when the state of nature is j Pattern Classification, Chapter 2 (Part 1)

R = Sum of all R(i | x) for i = 1,…,a Overall risk R = Sum of all R(i | x) for i = 1,…,a Minimizing R Minimizing R(i | x) for i = 1,…, a for i = 1,…,a Conditional risk Pattern Classification, Chapter 2 (Part 1)

Select the action i for which R(i | x) is minimum R is minimum and R in this case is called the Bayes risk = best performance that can be achieved! Pattern Classification, Chapter 2 (Part 1)

Two-category classification 1 : deciding 1 2 : deciding 2 ij = (i | j) loss incurred for deciding i when the true state of nature is j Conditional risk: R(1 | x) = 11P(1 | x) + 12P(2 | x) R(2 | x) = 21P(1 | x) + 22P(2 | x) Pattern Classification, Chapter 2 (Part 1)

action 1: “decide 1” is taken Our rule is the following: if R(1 | x) < R(2 | x) action 1: “decide 1” is taken This results in the equivalent rule : decide 1 if: (21- 11) P(x | 1) P(1) > (12- 22) P(x | 2) P(2) and decide 2 otherwise Pattern Classification, Chapter 2 (Part 1)

Likelihood ratio: Then take action 1 (decide 1) The preceding rule is equivalent to the following rule: Then take action 1 (decide 1) Otherwise take action 2 (decide 2) Pattern Classification, Chapter 2 (Part 1)

Optimal decision property “If the likelihood ratio exceeds a threshold value independent of the input pattern x, we can take optimal actions” Pattern Classification, Chapter 2 (Part 1)

Exercise Select the optimal decision where: = {1, 2} P(x | 1) N(2, 0.5) (Normal distribution) P(x | 2) N(1.5, 0.2) P(1) = 2/3 P(2) = 1/3 Pattern Classification, Chapter 2 (Part 1)