Download presentation

Presentation is loading. Please wait.

Published byCecilia Russell Modified over 5 years ago

1
Chapter 2: Bayesian Decision Theory (Part 2) Minimum-Error-Rate Classification Classifiers, Discriminant Functions and Decision Surfaces The Normal Density Dr. Hassan J. Eghbali,Computer Seince Department,Shiraz university

2
Dr.J.Eghbali, Computer Seince Department, Shiraz university CSE 616 Applied Pattern Recognition, Chapter 2, Section 2. Minimum-Error-Rate Classification Actions are decisions on classes If action i is taken and the true state of nature is j then: the decision is correct if i = j and in error if i j Seek a decision rule that minimizes the probability of error which is the error rate

3
Dr.J.Eghbali, Computer Seince Department, Shiraz university CSE 616 Applied Pattern Recognition, Chapter 2, Section 2. Introduction of the zero-one loss function: Therefore, the conditional risk is: “The risk corresponding to this loss function is the average probability error”

4
Dr.J.Eghbali, Computer Seince Department, Shiraz university CSE 616 Applied Pattern Recognition, Chapter 2, Section 2. Minimize the risk requires maximize P( i | x) (since R( i | x) = 1 – P( i | x)) For Minimum error rate Decide i if P ( i | x) > P( j | x) j i

5
Dr.J.Eghbali, Computer Seince Department, Shiraz university CSE 616 Applied Pattern Recognition, Chapter 2, Section 2. Regions of decision and zero-one loss function, therefore: If is the zero-one loss function wich means:

6
Dr.J.Eghbali, Computer Seince Department, Shiraz university CSE 616 Applied Pattern Recognition, Chapter 2, Section 2.

7
Classifiers, Discriminant Functions and Decision Surfaces The multi-category case Set of discriminant functions g i (x), i = 1,…, c The classifier assigns a feature vector x to class i if: g i (x) > g j (x) j i

8
Dr.J.Eghbali, Computer Seince Department, Shiraz university CSE 616 Applied Pattern Recognition, Chapter 2, Section 2.

9
Let g i (x) = - R( i | x) (max. discriminant corresponds to min. risk!) For the minimum error rate, we take g i (x) = P( i | x) (max. discrimination corresponds to max. posterior!) g i (x) P(x | i ) P( i ) g i (x) = ln P(x | i ) + ln P( i ) (ln: natural logarithm!)

10
Dr.J.Eghbali, Computer Seince Department, Shiraz university CSE 616 Applied Pattern Recognition, Chapter 2, Section 2. Feature space divided into c decision regions if g i (x) > g j (x) j i then x is in R i ( R i means assign x to i ) The two-category case A classifier is a “dichotomizer” that has two discriminant functions g 1 and g 2 Let g(x) g 1 (x) – g 2 (x) Decide 1 if g(x) > 0; Otherwise decide 2

11
Dr.J.Eghbali, Computer Seince Department, Shiraz university CSE 616 Applied Pattern Recognition, Chapter 2, Section 2. The computation of g(x)

12
Dr.J.Eghbali, Computer Seince Department, Shiraz university CSE 616 Applied Pattern Recognition, Chapter 2, Section 2.

13
The Normal Density Univariate density Density which is analytically tractable Continuous density A lot of processes are asymptotically Gaussian Handwritten characters, speech sounds are ideal or prototype corrupted by random process (central limit theorem) Where: = mean (or expected value) of x 2 = expected squared deviation or variance

14
Dr.J.Eghbali, Computer Seince Department, Shiraz university CSE 616 Applied Pattern Recognition, Chapter 2, Section 2.

15
Multivariate density Multivariate normal density in d dimensions is: where: x = (x 1, x 2, …, x d ) t (t stands for the transpose vector form) = ( 1, 2, …, d ) t mean vector = d*d covariance matrix | | and -1 are determinant and inverse respectively

Similar presentations

© 2020 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google