Principles of Pattern Recognition

Name: Principles of Pattern Recognition
Uploaded: 2017-08-28T17:54:46+00:00
Duration: PTM19S3
Channel: Jodie Jordan
Description: Principles of Pattern Recognition

240-650 Principles of Pattern Recognition
Montri Karnjanadecha : Chapter 2: Bayesian Decision Theory

Bayesian Decision Theory
Chapter 2 Bayesian Decision Theory : Chapter 2: Bayesian Decision Theory

Statistical Approach to Pattern Recognition
: Chapter 2: Bayesian Decision Theory

240-650: Chapter 2: Bayesian Decision Theory
A Simple Example Suppose that we are given two classes w1 and w2 P(w1) = 0.7 P(w2) = 0.3 No measurement is given Guessing What shall we do to recognize a given input? What is the best we can do statistically? Why? : Chapter 2: Bayesian Decision Theory

A More Complicated Example
Suppose that we are given two classes A single measurement x P(w1|x) and P(w2|x) are given graphically : Chapter 2: Bayesian Decision Theory

A Bayesian Example Suppose that we are given two classes A single measurement x We are given p(x|w1) and p(x|w2) this time : Chapter 2: Bayesian Decision Theory

A Bayesian Example – cont.

Bayes formula In case of two categories In English, it can be expressed as : Chapter 2: Bayesian Decision Theory

Bayesian Decision Theory – cont.
A posterior probability The probability of the state of nature being given that feature value x has been measured Likelihood is the likelihood of with respect to x Evidence The evidence factor can be viewed as a scaling factor that guarantees that the posterior probabilities sum to one. : Chapter 2: Bayesian Decision Theory

Whenever we observe a particular x, the prob. of error is The average prob. of error is given by : Chapter 2: Bayesian Decision Theory

Bayesian Decision Theory--continuous features
Feature space In general, an input can be represented by a vector, a point in a d-dimensional Euclidean space Rd Loss function The loss function states exactly how costly each action is and is used to convert a probability determination into a decision Written as : Chapter 2: Bayesian Decision Theory

Loss Function Describe the loss incurred for taking action ai when the state of nature is wj : Chapter 2: Bayesian Decision Theory

Conditional Risk Suppose we observe a particular x We take action ai If the true state of nature is wj By definition we will incur the loss l(ai|wj) We can minimize our expected loss by selecting the action that minimize the condition risk, R(ai|x) : Chapter 2: Bayesian Decision Theory

Suppose that there are c categories {w1, w2, ..., wc} Conditional risk Risk is the average expected loss : Chapter 2: Bayesian Decision Theory

Bayes decision rule For a given x, select the action ai for which the conditional risk is minimum The resulting minimum overall risk is called the Bayes risk, denoted as R*, which is the best performance that can be achieved : Chapter 2: Bayesian Decision Theory

Two-Category Classification
Let lij = l(ai|wj) Conditional risk Fundamental decision rule Decide w1 if R(a1|x) < R(w2|x) : Chapter 2: Bayesian Decision Theory

Two-Category Classification – cont.
The decision rule can be written in several ways Decide w1 if one of the followings is true These rules are equivalent Likelihood Ratio : Chapter 2: Bayesian Decision Theory

Minimum-Error-Rate Classification
A special case of the Bayes decision rule with the following zero-one loss function Assigns no loss to correct decision Assigns unit loss to any error All errors are equally costly : Chapter 2: Bayesian Decision Theory

Conditional risk : Chapter 2: Bayesian Decision Theory

We should select i that maximizes the posterior probability For minimum error rate: Decide : Chapter 2: Bayesian Decision Theory

Classifiers, Discriminant Functions, and Decision Surfaces
There are many ways to represent pattern classifiers One of the most useful is in terms of a set of discriminant functions gi(x), i=1,…,c The classifier assigns a feature vector x to class if : Chapter 2: Bayesian Decision Theory

The Multicategory Classifier

There are many equivalent discriminant functions i.e., the classification results will be the same even though they are different functions For example, if f is a monotonically increasing function, then : Chapter 2: Bayesian Decision Theory

Some of discriminant functions are easier to understand or to compute : Chapter 2: Bayesian Decision Theory

Decision Regions The effect of any decision is to divide the feature space into c decision regions, R1, ..., Rc The regions are separated with decision boundaries, where ties occur among the largest discriminant functions : Chapter 2: Bayesian Decision Theory

Decision Regions – cont.

Two-Category Case (Dichotomizer)
Two-category case is a special case Instead of two discriminant functions, a single one can be used : Chapter 2: Bayesian Decision Theory

The Normal Density Univariate Gaussian Density Mean Variance : Chapter 2: Bayesian Decision Theory

The Normal Density : Chapter 2: Bayesian Decision Theory

The Normal Density Central Limit Theorem The aggregate effect of the sum of a large number of small, independent random disturbances will lead to a Gaussian distribution Gaussian is often a good model for the actual probability distribution : Chapter 2: Bayesian Decision Theory

The Multivariate Normal Density
Multivariate Density (in d dimension) Abbreviation : Chapter 2: Bayesian Decision Theory

The Multivariate Normal Density
Mean Covariance matrix The ijth component of : Chapter 2: Bayesian Decision Theory

Statistically Independence
If xi and xj are statistically independence then The covariance matrix will become a diagonal matrix where all off-diagonal elements are zero : Chapter 2: Bayesian Decision Theory

Whitening Transform Diagonal matrix of the corresponding eigenvalues of matrix whose columns are the orthonormal eigenvectors of : Chapter 2: Bayesian Decision Theory

Whitening Transform : Chapter 2: Bayesian Decision Theory

Squared Mahalanobis Distance from x to m
Constant density Principle axes of hyperellipsiods are given by the eigenvectors of S Length of axes are determined by eigenvalues of S : Chapter 2: Bayesian Decision Theory

Discriminant Functions for the Normal Density
Minimum distance classifier If the density are multivariate normal– i.e., if Then we have: : Chapter 2: Bayesian Decision Theory

Discriminant Functions for the Normal Density
Case 1: Features are statistically independence and each feature has the same variance Where || . || denotes the Euclidean norm : Chapter 2: Bayesian Decision Theory

Case 1: Si = s2I : Chapter 2: Bayesian Decision Theory

Linear Discriminant Function
It is not necessary to compute distances Expanding the form yields The term is the same for all i We have the following linear discriminant function : Chapter 2: Bayesian Decision Theory

Linear Discriminant Function
where and Threshold or bias for the ith category : Chapter 2: Bayesian Decision Theory

Linear Machine A classifier that uses linear discriminant functions is called a linear machine Its decision surfaces are pieces of hyperplanes defined by the linear equations for the two categories with the highest posterior probabilities. For our case this equation can be written as : Chapter 2: Bayesian Decision Theory

Linear Machine Where And If then the second term vanishes It is called a minimum-distance classifier : Chapter 2: Bayesian Decision Theory

Priors change -> decision boundaries shift

Case 2: Si = S Covariance matrices for all of the classes are identical but otherwise arbitrary The cluster for the ith class is centered about mi Discriminant function: Can be ignored if prior probabilities are the same for all classes : Chapter 2: Bayesian Decision Theory

Case 2: Discriminant function
Where and : Chapter 2: Bayesian Decision Theory

For 2-category case If Ri and Rj are contiguous, the boundary between them has the equation where and : Chapter 2: Bayesian Decision Theory

Case 3: Si = arbitrary In general, the covariance matrices are different for each category The only term that can be dropped is the (d/2) ln 2p term : Chapter 2: Bayesian Decision Theory

Case 3: Si = arbitrary The discriminant functions are Where and : Chapter 2: Bayesian Decision Theory

Two-category case The decision surface are hyperquadrics (hyperplanes, hyperspheres, hyperellipsoids, hyperparaboloids,…) : Chapter 2: Bayesian Decision Theory

Example : Chapter 2: Bayesian Decision Theory

Principles of Pattern Recognition

Similar presentations

Presentation on theme: "Principles of Pattern Recognition"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Principles of Pattern Recognition

Similar presentations

Presentation on theme: "Principles of Pattern Recognition"— Presentation transcript:

Similar presentations

About project

Feedback